In this document, we build a logistic regression model of intervention decisions. This model is basically a psychometric function which estimates two parameters: 1. The point of subjective equality (PSE): The level of evidence at which users see the team as having equal utility with or without the new player. This parameter reflects bias in decision-making, either toward or away from intervening. 2. The just-noticeable difference (JND): The amount of evidence necessary to increase the user’s likelihood of intervening from 50% (at the PSE) to 75% (an arbitrary performance threshold). JNDs are inversely proportional to the slope of the linear model. They reflect the sensitivity of the user to evidence such that smaller JNDs suggest that users evaluate prospects more consistently and larger JNDs suggest a higher degree of noise in the perception of utility.
We load worker responses from our experiment and do some preprocessing.
# read in data
full_df <- read_csv("experiment-anonymous.csv")
## Parsed with column specification:
## cols(
## .default = col_double(),
## workerId = col_character(),
## condition = col_character(),
## start_means = col_logical(),
## gender = col_character(),
## age = col_character(),
## education = col_character(),
## chart_use = col_character(),
## strategy_with_means = col_character(),
## strategy_without_means = col_character(),
## outcome = col_logical(),
## trial = col_character(),
## trialIdx = col_character()
## )
## See spec(...) for full column specifications.
# preprocessing
responses_df <- full_df %>%
rename( # rename to convert away from camel case
worker_id = workerId,
ground_truth = groundTruth,
sd_diff = sdDiff,
p_award_with = pAwardWith,
p_award_without = pAwardWithout,
account_value = accountValue,
p_superiority = pSup,
start_time = startTime,
resp_time = respTime,
trial_dur = trialDur,
trial_idx = trialIdx
) %>%
# remove practice and mock trials from responses dataframe, leave in full version
filter(trial_idx != "practice", trial_idx != "mock") %>%
# mutate rows where intervene == -1 for some reason
mutate(
intervene = if_else(intervene == -1,
# repair
if_else((payoff == (award_value - 1) | payoff == -1),
1, # payed for intervention
0), # didn't pay for intervention
# don't repair
as.numeric(intervene) # hack to avoid type error
)
) %>%
# set up factors for modeling
mutate(
# add a variable to note whether the chart they viewed showed means
means = as.factor((start_means & as.numeric(trial) <= (n_trials / 2)) | (!start_means & as.numeric(trial) > (n_trials / 2))),
start_means = as.factor(start_means),
sd_diff = as.factor(sd_diff),
trial_number = as.numeric(trial)
)
head(responses_df)
## # A tibble: 6 x 38
## worker_id batch n_trials n_data_conds condition baseline es_threshold
## <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 7819bfb6 17 34 18 intervals 0.5 0.9
## 2 7819bfb6 17 34 18 intervals 0.5 0.9
## 3 7819bfb6 17 34 18 intervals 0.5 0.9
## 4 7819bfb6 17 34 18 intervals 0.5 0.9
## 5 7819bfb6 17 34 18 intervals 0.5 0.9
## 6 7819bfb6 17 34 18 intervals 0.5 0.9
## # … with 31 more variables: start_means <fct>, award_value <dbl>,
## # starting_value <dbl>, exchange <dbl>, cutoff <dbl>, max_bonus <dbl>,
## # total_bonus <dbl>, duration <dbl>, numeracy <dbl>, gender <chr>, age <chr>,
## # education <chr>, chart_use <chr>, strategy_with_means <chr>,
## # strategy_without_means <chr>, account_value <dbl>, ground_truth <dbl>,
## # intervene <dbl>, outcome <lgl>, p_award_with <dbl>, p_award_without <dbl>,
## # p_superiority <dbl>, payoff <dbl>, resp_time <dbl>, sd_diff <fct>,
## # start_time <dbl>, trial <chr>, trial_dur <dbl>, trial_idx <chr>,
## # means <fct>, trial_number <dbl>
We need the data in a format where it is prepared for modeling. This means we want to calculate a scale of evidence in favor of intervention. We calculate this by apply a log odds transform to our utility optimal decision rule, transforming our evidence from differences of probabilities into log odds units consistent with the idea that people perceive proabilities as log odds.
model_df <- responses_df %>%
mutate(
# evidence scale
p_diff = p_award_with - (p_award_without + (1 / award_value)),
evidence = qlogis(p_award_with) - qlogis(p_award_without + (1 / award_value)),
# scale and center trial order
trial = (trial_number - as.numeric(n_trials) / 2) / as.numeric(n_trials)
)
model_df %>%
ggplot(aes(p_diff, evidence)) +
geom_line() +
geom_vline(xintercept = 0, size = 1, alpha = .3, color = "red", linetype = "dashed") + # utility optimal decision rule
geom_hline(yintercept = 0, size = 1, alpha = .3, color = "red", linetype = "dashed") +
theme_bw() +
labs(
x = "Evidence in terms of probability",
y = "Evidence in terms of log odds"
)
Let’s look at the distribution of levels of evidence sampled on this scale. It should look approximately uniform.
model_df %>% ggplot(aes(x = evidence)) +
geom_histogram(fill = "black", binwidth = 0.25) +
geom_vline(xintercept = 0, size = 1, alpha = .3, color = "red", linetype = "dashed") + # utility optimal decision rule
xlim(quantile(model_df$evidence, c(0, 1))) +
theme_bw() +
theme(panel.grid = element_blank())
## Warning: Removed 2 rows containing missing values (geom_bar).
Now, lets apply our exclusion criteria, cutting our sample down to only the subset of participants who passed both attention checks.
# determine exclusions
exclude_df <- model_df %>%
# attention check trials where ground truth = c(0.5, 0.999)
mutate(failed_check = (ground_truth == 0.5 & intervene != 0) | (ground_truth == 0.999 & intervene != 1)) %>%
group_by(worker_id) %>%
summarise(
failed_attention_checks = sum(failed_check),
unique_p_sup = length(unique(p_superiority)),
# excluded if they failed either attention check or used fewer than three levels of the response scale
exclude = failed_attention_checks > 0 | unique_p_sup < 3
) %>%
dplyr::select(worker_id, exclude)
# apply exclusion criteria and remove attention check trials from modeling data set
model_df <- model_df %>%
left_join(exclude_df, by = "worker_id") %>%
filter(exclude == FALSE) %>%
filter(ground_truth > 0.5 & ground_truth < 0.999)
# how many remaining workers per condition?
model_df %>%
group_by(condition, start_means) %>% # between subject manipulations
summarise(
n_workers = length(unique(worker_id))
)
## # A tibble: 8 x 3
## # Groups: condition [4]
## condition start_means n_workers
## <chr> <fct> <int>
## 1 densities FALSE 79
## 2 densities TRUE 78
## 3 HOPs FALSE 79
## 4 HOPs TRUE 76
## 5 intervals FALSE 80
## 6 intervals TRUE 80
## 7 QDPs FALSE 77
## 8 QDPs TRUE 77
In addition to excluding participants who failed at least one of the two attention checks in the experiment, which is our preregistered exclusion criterion, we also exclude a handful of workers whose data lead to model fit issues. These are workers who responded with only one or two levels of the probability of superiority scale. We could make the case that these workers might not have been trying very hard when responding, but the reason for excluding them is much more practical: It is not possible for the modeling process we are using to estimate random effects on response variability for these participants (i.e., you cannot calculate the variance of a set with only one or two distinct values). These random effects on variance are very important, because our data almost certaintly violate a homogeneity of variance assumption.
Because of these exclusions, we are a few participants short of our target samples size of 80. We should still have more than enough data to support statistical inferences. Here we drop a handful of additional participants to maintain counterbalancing of block order. Since we know that there were some participants with dropped responses, let’s prioritize leaving out workers with the greatest number of dropped trials in each counterbalancing condition.
model_df %>%
group_by(condition, start_means, worker_id) %>%
summarise(
n_trials = n(),
dropped_trials = 32 - n_trials
) %>%
filter(dropped_trials > 0)
## # A tibble: 5 x 5
## # Groups: condition, start_means [3]
## condition start_means worker_id n_trials dropped_trials
## <chr> <fct> <chr> <int> <dbl>
## 1 densities TRUE e4b46997 24 8
## 2 HOPs FALSE c488db75 5 27
## 3 HOPs FALSE ce016e09 25 7
## 4 HOPs FALSE f430e2e8 28 4
## 5 intervals FALSE ff8a2a69 28 4
Based on a comparison of the two tables above, we’ll drop workers c488db75, ce016e09, f430e2e8, and c337674a to ensure our ability to fit our model and to maintain counterbalancing of block order.
# remove workers with missing data, plus one where condition = densities, start_means = FALSE
model_df <- model_df %>%
filter(!worker_id %in% c("c488db75", "ce016e09", "f430e2e8", "c337674a"))
model_df %>%
group_by(condition, start_means) %>% # between subject manipulations
summarise(
n_workers = length(unique(worker_id))
)
## # A tibble: 8 x 3
## # Groups: condition [4]
## condition start_means n_workers
## <chr> <fct> <int>
## 1 densities FALSE 78
## 2 densities TRUE 78
## 3 HOPs FALSE 76
## 4 HOPs TRUE 76
## 5 intervals FALSE 80
## 6 intervals TRUE 80
## 7 QDPs FALSE 77
## 8 QDPs TRUE 77
Now we have our dataset ready for modeling.
We start as simply as possible by just modeling the distribution of decisions using a logit link function and an intercept model. Here we are just estimating the mean probability of intervening.
Before we fit the model to our data, let’s check that our priors seem reasonable. We’ll set a pretty wide prior on the intercept parameter and locate it at zero since this reflects a weakly informative expectation of no bias.
# get_prior(data = model_df,
# family = bernoulli(link = "logit"),
# formula = bf(intervene ~ 1))
# starting as simple as possible: learn the distribution of decisions
prior.logistic_intercept <- brm(data = model_df, family = bernoulli(link = "logit"),
formula = bf(intervene ~ 1),
prior = c(prior(normal(0, 1), class = Intercept)),
sample_prior = "only",
iter = 3000, warmup = 500, chains = 2, cores = 2)
## Compiling the C++ model
## Start sampling
Let’s look at our prior predictive distribution. This should just be an even split between 0 and 1.
# prior predictive check
model_df %>%
select(-intervene) %>%
add_predicted_draws(prior.logistic_intercept, prediction = "intervene", seed = 1234) %>%
ggplot(aes(x = intervene)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Prior predictive distribution for intervention decisions") +
theme(panel.grid = element_blank())
Now, let’s fit the model to our data.
# starting as simple as possible: learn the distribution of decisions
m.logistic_intercept <- brm(data = model_df, family = bernoulli(link = "logit"),
formula = bf(intervene ~ 1),
prior = c(prior(normal(0, 1), class = Intercept)),
iter = 3000, warmup = 500, chains = 2, cores = 2,
file = "model-fits/logistic_intercept_mdl")
Check diagnostics:
# trace plots
plot(m.logistic_intercept)
# model summary
print(m.logistic_intercept)
## Family: bernoulli
## Links: mu = logit
## Formula: intervene ~ 1
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 3000; warmup = 500; thin = 1;
## total post-warmup samples = 5000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.36 0.01 0.33 0.39 1.00 1761 2440
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Let’s check out a posterior predictive distribution for intervention decisions.
# posterior predictive check
model_df %>%
select(-intervene) %>% # this model should not be sensitive to evidence
add_predicted_draws(m.logistic_intercept, prediction = "intervene", seed = 1234, n = 500) %>%
ggplot(aes(x = intervene)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior predictive distribution for intervention") +
theme(panel.grid = element_blank())
The posterior predictive distribution is about what we’d expect. The slight bias toward intervening is consistent with a positive intercept parameter.
How do the posterior predictions compare to the observed data?
# data density
model_df %>%
ggplot(aes(x = intervene)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Data distribution for intervention") +
theme(panel.grid = element_blank())
Since these look pretty good, even for an intercept model, it seems like other predictive checks will be a better indicator of fit going forward.
Let’s take a look at the estimated psychometric function. This should not have any slope. It is just an estimate of what proportion of the time people intervene.
model_df %>%
select(evidence, intervene) %>%
add_fitted_draws(m.logistic_intercept, value = "pf", n = 500) %>%
ggplot(aes(x = evidence, y = intervene)) +
geom_vline(xintercept = 0, size = 1, alpha = .3, color = "red", linetype = "dashed") + # utility optimal decision rule
# geom_line(aes(y = pf, group = .draw)) +
stat_lineribbon(aes(y = pf), .width = c(.95, .80, .50), alpha = .25, fill = "black") +
geom_point(alpha = .15, color = "black") +
coord_cartesian(xlim = quantile(model_df$evidence, c(0, 1)),
ylim = quantile(model_df$intervene, c(0, 1))) +
theme_bw() +
theme(panel.grid.minor = element_blank())
Now we’ll add a slope parameter to our model to make it a simple linear model where decisions to intervene are a function of the probability of getting the award with the new player.
Before we fit our model let’s check that our priors seem reasonable. We’ll keep the same weak prior on our intercept parameter, and we’ll set a similarly weak prior on the sensitivity to evidence, centering our our prior on a one-to-one correspondence between units of evidence and perceived evidence.
# get_prior(data = model_df, family = bernoulli(link = "logit"), formula = bf(intervene ~ 1 + evidence))
# simple logisic regression
prior.logistic <- brm(data = model_df, family = bernoulli(link = "logit"),
formula = bf(intervene ~ 1 + evidence),
prior = c(prior(normal(0, 1), class = Intercept),
prior(normal(1, 1), class = b)),
sample_prior = "only",
iter = 3000, warmup = 500, chains = 2, cores = 2)
## Compiling the C++ model
## Start sampling
Let’s look at our prior predictive distribution. This should still be an even split between 0 and 1, although now our model should be sensitive to evidence.
# prior predictive check
model_df %>%
select(evidence) %>%
add_predicted_draws(prior.logistic, prediction = "intervene", seed = 1234, n = 500) %>%
ggplot(aes(x = intervene)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Prior predictive distribution for intervention decisions") +
theme(panel.grid = element_blank())
Perhaps a better way of seeing this is the predicted fit. We’ll look at this instead of prior predictions from here on.
# prior predictive check
model_df %>%
select(evidence) %>%
add_fitted_draws(prior.logistic, value = "pf", seed = 1234, n = 500) %>%
ggplot(aes(x = evidence, y = pf)) +
stat_lineribbon(.width = c(.95, .80, .50), alpha = .25, fill = "black") +
coord_cartesian(xlim = quantile(model_df$evidence, c(0, 1)),
ylim = quantile(model_df$intervene, c(0, 1))) +
labs(subtitle = "Prior predictive PF fit") +
theme(panel.grid = element_blank())
Now, let’s fit the model.
# logistic regression
m.logistic <- brm(data = model_df, family = bernoulli(link = "logit"),
formula = bf(intervene ~ 1 + evidence),
prior = c(prior(normal(0, 1), class = Intercept),
prior(normal(1, 1), class = b)),
iter = 3000, warmup = 500, chains = 2, cores = 2,
file = "model-fits/logistic_mdl")
Check diagnostics:
# trace plots
plot(m.logistic)
# pairs plot
pairs(m.logistic)
# model summary
print(m.logistic)
## Family: bernoulli
## Links: mu = logit
## Formula: intervene ~ 1 + evidence
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 3000; warmup = 500; thin = 1;
## total post-warmup samples = 5000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.14 0.02 0.11 0.17 1.00 3919 3544
## evidence 0.82 0.01 0.79 0.85 1.00 3160 3281
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Let’s take a look at the estimated psychometric function.
model_df %>%
add_fitted_draws(m.logistic, value = "pf", n = 500) %>%
ggplot(aes(x = evidence, y = intervene)) +
geom_vline(xintercept = 0, size = 1, alpha = .3, color = "red", linetype = "dashed") + # utility optimal decision rule
# geom_line(aes(y = pf, group = .draw)) +
stat_lineribbon(aes(y = pf), .width = c(.95, .80, .50), alpha = .25, fill = "black") +
geom_point(alpha = .15, color = "black") +
coord_cartesian(xlim = quantile(model_df$evidence, c(0, 1)),
ylim = quantile(model_df$intervene, c(0, 1))) +
theme_bw() +
theme(panel.grid = element_blank())
The models we’ve created thus far fail to account for much of the noise in the data. Here, we attempt to parse some heterogeniety in responses by modeling a random effect of worker on slopes and intercepts. This introduces a hierarchical component to our model in order to account for individual differences in the best fitting linear model for each worker’s data.
Before we fit our model let’s check that our priors seem reasonable. We add priors for the standard deviation of random slopes and intercepts per worker and their correlation. The priors for random effects are moderately weak, and the prior for their correlation is meant to avoid extreme correlations that could mess up the model fit.
# get_prior(data = model_df, family = bernoulli(link = "logit"), formula = bf(intervene ~ (1 + evidence|worker_id) + evidence))
# starting as simple as possible: learn the distribution of decisions
prior.wrkr.logistic <- brm(data = model_df, family = bernoulli(link = "logit"),
formula = bf(intervene ~ (1 + evidence|worker_id) + evidence),
prior = c(prior(normal(0, 1), class = Intercept),
prior(normal(1, 1), class = b),
prior(normal(0, 0.5), class = sd),
prior(lkj(4), class = cor)),
sample_prior = "only",
iter = 3000, warmup = 500, chains = 2, cores = 2)
## Compiling the C++ model
## Start sampling
Let’s look at predicted model fits to see the space of possible models predicted by our priors.
# prior predictive check
model_df %>%
select(evidence, worker_id) %>%
add_fitted_draws(prior.wrkr.logistic, value = "pf", seed = 1234, n = 500) %>%
ggplot(aes(x = evidence, y = pf)) +
stat_lineribbon(.width = c(.95, .80, .50), alpha = .25, fill = "black") +
coord_cartesian(xlim = quantile(model_df$evidence, c(0, 1)),
ylim = quantile(model_df$intervene, c(0, 1))) +
labs(subtitle = "Prior predictive PF fit") +
theme(panel.grid = element_blank())
# hierarchical linear model with logit link
m.wrkr.logistic <- brm(data = model_df, family = bernoulli(link = "logit"),
formula = bf(intervene ~ (1 + evidence|worker_id) + evidence),
prior = c(prior(normal(0, 1), class = Intercept),
prior(normal(1, 1), class = b),
prior(normal(0, 0.5), class = sd),
prior(lkj(4), class = cor)),
iter = 3000, warmup = 500, chains = 2, cores = 2,
file = "model-fits/logistic_mdl-wrkr")
Check diagnostics:
# trace plots
plot(m.wrkr.logistic)
# pairs plot
pairs(m.wrkr.logistic)
# model summary
print(m.wrkr.logistic)
## Family: bernoulli
## Links: mu = logit
## Formula: intervene ~ (1 + evidence | worker_id) + evidence
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 3000; warmup = 500; thin = 1;
## total post-warmup samples = 5000
##
## Group-Level Effects:
## ~worker_id (Number of levels: 623)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept) 1.29 0.05 1.20 1.39 1.00 1686
## sd(evidence) 0.95 0.05 0.86 1.05 1.00 1561
## cor(Intercept,evidence) 0.50 0.04 0.41 0.58 1.00 1960
## Tail_ESS
## sd(Intercept) 3186
## sd(evidence) 2532
## cor(Intercept,evidence) 2969
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.42 0.06 0.31 0.54 1.00 1482 2807
## evidence 1.51 0.05 1.41 1.61 1.00 2215 2966
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Let’s take a look at the estimated psychometric functions for each worker. When we make this kind of plot for model checks at the level of individual workers, we’ll look at a subset of workers to keep the number of charts generated to a reasonable number.
# two workers from each counterbalancing condition
model_check_set <- model_df %>%
group_by(start_means, condition, worker_id) %>%
summarise() %>%
top_n(2)
## Selecting by worker_id
model_check_set <- model_check_set$worker_id
model_check_df <- model_df %>%
filter(worker_id %in% model_check_set)
model_check_df %>%
group_by(worker_id) %>%
summarise()
## # A tibble: 16 x 1
## worker_id
## <chr>
## 1 f27ed3b6
## 2 f4f534e0
## 3 f5d48035
## 4 f796f54d
## 5 f7f69f44
## 6 f83e2827
## 7 fa0f4b94
## 8 fa22b8bb
## 9 fba3405d
## 10 fccb21d5
## 11 fd15ec30
## 12 fd3bea1b
## 13 fdb8555e
## 14 fe8936cd
## 15 fee45dce
## 16 ff8a2a69
model_check_df %>%
group_by(evidence, worker_id) %>%
add_fitted_draws(m.wrkr.logistic, value = "pf", n = 200) %>%
ggplot(aes(x = evidence, y = intervene, color = condition, fill = condition)) +
geom_vline(xintercept = 0, size = 1, alpha = .3, color = "red", linetype = "dashed") + # utility optimal decision rule
stat_lineribbon(aes(y = pf), .width = c(.95), alpha = .25) +
geom_point(alpha = .15) +
scale_fill_brewer(type = "qual", palette = 1) +
scale_color_brewer(type = "qual", palette = 1) +
coord_cartesian(xlim = quantile(model_df$evidence, c(0, 1)),
ylim = quantile(model_df$intervene, c(0, 1))) +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_wrap(. ~ worker_id)
These fits are looking better. Let’s try expanding our model to include the manipulations in the experiment that we need to acccount for to answer our research questions.
We want to know whether adding means to visualizations biases the point of subjective equality or changes sensitivity to utility on average. We model these effects as a fixed intercept and a fixed slope (i.e., an interaction with the level of evidence).
Before we fit our model let’s check that our priors seem reasonable. Now that we are modeling multiple slopes using dummy coding, we need to differentiate between the baseline slope when there are no means (for which we use the same prior as before) and the effect of means on slopes (a difference from baseline which should be located at 0 and have a moderately weak prior).
# get_prior(data = model_df, family = bernoulli(link = "logit"), formula = bf(intervene ~ (1 + evidence|worker_id) + evidence*means))
# starting as simple as possible: learn the distribution of decisions
prior.wrkr.means.logistic <- brm(data = model_df, family = bernoulli(link = "logit"),
formula = bf(intervene ~ (1 + evidence|worker_id) + evidence* means),
prior = c(prior(normal(0, 1), class = Intercept),
prior(normal(1, 1), class = b, coef = evidence),
prior(normal(0, 0.5), class = b),
prior(normal(0, 0.5), class = sd),
prior(lkj(4), class = cor)),
sample_prior = "only",
iter = 3000, warmup = 500, chains = 2, cores = 2)
## Compiling the C++ model
## Start sampling
Let’s look at predicted model fits to see the space of possible models predicted by our priors.
# prior predictive check
model_df %>%
select(evidence, worker_id, means) %>%
add_fitted_draws(prior.wrkr.means.logistic, value = "pf", seed = 1234, n = 500) %>%
ggplot(aes(x = evidence, y = pf)) +
stat_lineribbon(.width = c(.95, .80, .50), alpha = .25, fill = "black") +
coord_cartesian(xlim = quantile(model_df$evidence, c(0, 1)),
ylim = quantile(model_df$intervene, c(0, 1))) +
labs(subtitle = "Prior predictive PF fit") +
theme(panel.grid = element_blank())
# hierarchical linear model with logit link and predictors for the presence/absence of means
m.wrkr.means.logistic <- brm(data = model_df, family = bernoulli(link = "logit"),
formula = bf(intervene ~ (1 + evidence|worker_id) + evidence* means),
prior = c(prior(normal(0, 1), class = Intercept),
prior(normal(1, 1), class = b, coef = evidence),
prior(normal(0, 0.5), class = b),
prior(normal(0, 0.5), class = sd),
prior(lkj(4), class = cor)),
iter = 3000, warmup = 500, chains = 2, cores = 2,
file = "model-fits/logistic_mdl-wrkr_means")
Check diagnostics:
# trace plots
plot(m.wrkr.means.logistic)
# pairs plot
pairs(m.wrkr.means.logistic)
# model summary
print(m.wrkr.means.logistic)
## Family: bernoulli
## Links: mu = logit
## Formula: intervene ~ (1 + evidence | worker_id) + evidence * means
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 3000; warmup = 500; thin = 1;
## total post-warmup samples = 5000
##
## Group-Level Effects:
## ~worker_id (Number of levels: 623)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept) 1.29 0.05 1.20 1.39 1.00 1847
## sd(evidence) 0.95 0.05 0.86 1.05 1.00 1461
## cor(Intercept,evidence) 0.50 0.04 0.41 0.58 1.00 1932
## Tail_ESS
## sd(Intercept) 2805
## sd(evidence) 2271
## cor(Intercept,evidence) 2746
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.43 0.06 0.32 0.55 1.00 1617 2701
## evidence 1.57 0.05 1.46 1.67 1.00 2245 3290
## meansTRUE -0.02 0.04 -0.09 0.05 1.00 10552 3430
## evidence:meansTRUE -0.11 0.03 -0.17 -0.04 1.00 11413 3823
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Let’s take a look at the estimated psychometric functions for each worker.
model_check_df %>%
group_by(evidence, worker_id, means) %>%
add_fitted_draws(m.wrkr.means.logistic, value = "pf", n = 200) %>%
ggplot(aes(x = evidence, y = intervene, color = condition, fill = condition)) +
geom_vline(xintercept = 0, size = 1, alpha = .3, color = "red", linetype = "dashed") + # utility optimal decision rule
stat_lineribbon(aes(y = pf), .width = c(.95), alpha = .25) +
geom_point(alpha = .15) +
scale_fill_brewer(type = "qual", palette = 1) +
scale_color_brewer(type = "qual", palette = 1) +
coord_cartesian(xlim = quantile(model_df$evidence, c(0, 1)),
ylim = quantile(model_df$intervene, c(0, 1))) +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_wrap(. ~ worker_id)
It looks like there may be some differences per visualization condition that our model is not capturing.
What do the posteriors for just-noticable differences (JND) and points of subjective equality (PSE) look like when means are present vs absent? Since we are probing conditional expectations, we’ll forego calculating maringal effects by manually combining parameters. Instead we’ll use add_fitted_draws and compare_levels to get slopes and intercepts. Then we’ll use these to derive estimates of the JND and PSE for each fitted draw.
# get slopes from linear model
slopes_df <- model_df %>%
group_by(means) %>%
data_grid(evidence = c(0, 1)) %>%
add_fitted_draws(m.wrkr.means.logistic, re_formula = NA, scale = "linear") %>%
compare_levels(.value, by = evidence) %>%
rename(slope = .value)
# get intercepts from linear model
intercepts_df <- model_df %>%
group_by(means) %>%
data_grid(evidence = 0) %>%
add_fitted_draws(m.wrkr.means.logistic, re_formula = NA, scale = "linear") %>%
rename(intercept = .value)
# join dataframes for slopes and intercepts, calculate PSE and JND
stats_df <- slopes_df %>%
full_join(intercepts_df, by = c("means", ".draw")) %>%
mutate(
pse = -intercept / slope,
jnd = qlogis(0.75) / slope
)
First, let’s look at the estimates of JNDs per condition.
stats_df %>%
# compare_levels(jnd, by = means) %>%
# mutate(jnd_p_award = exp(jnd) / (1 / (unique(model_df$baseline) + 1 / unique(model_df$award_value)) - 1 + exp(jnd)) - unique(model_df$baseline) - 1 / unique(model_df$award_value)) %>%
ggplot(aes(x = jnd, group = means, color = means, fill = means)) +
geom_density(alpha = 0.35) +
# ggplot(aes(x = jnd_p_award)) +
# geom_density(alpha = 0.35, fill = "black") +
scale_x_continuous(expression(jnd), expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior JND per condition") +
# labs(subtitle = "Posterior difference in JNDs with means present vs absent") +
theme(panel.grid = element_blank())
It looks like users are similarly sensitive to evidence whether means are present or absent, ignoring the effect of other manipulations.
Next, we’ll look at the point of subjective equality in each condition.
stats_df %>%
ggplot(aes(x = pse, group = means, color = means, fill = means)) +
geom_density(alpha = 0.35) +
scale_x_continuous(expression(pse), expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior PSE per condition") +
theme(panel.grid = element_blank())
It looks like the point of subjective equality less biased when means are present, ignoring the impact of other manipulations.
We expect that the level of uncertainty shown will impact judgments of effect size. Does it similarly effect the impact of means on decision-making? We test this by adding sd_diff to the interaction term in our model specification.
We’ll use the same priors as in our previous model, so we’ll jump right to fitting the model.
# hierarchical linear model with logit link and predictors for the presence/absence of means, level of uncertainty shown
m.wrkr.means.sd.logistic <- brm(data = model_df, family = bernoulli(link = "logit"),
formula = bf(intervene ~ (1 + evidence|worker_id) + evidence*means*sd_diff),
prior = c(prior(normal(0, 1), class = Intercept),
prior(normal(1, 1), class = b, coef = evidence),
prior(normal(0, 0.5), class = b),
prior(normal(0, 0.5), class = sd),
prior(lkj(4), class = cor)),
iter = 3000, warmup = 500, chains = 2, cores = 2,
file = "model-fits/logistic_mdl-wrkr_means_sd")
Check diagnostics:
# trace plots
plot(m.wrkr.means.sd.logistic)
# pairs plot
pairs(m.wrkr.means.sd.logistic, pars = c("b_"))
# pairs plot
pairs(m.wrkr.means.sd.logistic, pars = c("sd_worker_id", "cor_worker_id"))
print(m.wrkr.means.sd.logistic)
## Family: bernoulli
## Links: mu = logit
## Formula: intervene ~ (1 + evidence | worker_id) + evidence * means * sd_diff
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 3000; warmup = 500; thin = 1;
## total post-warmup samples = 5000
##
## Group-Level Effects:
## ~worker_id (Number of levels: 623)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept) 1.41 0.05 1.31 1.51 1.00 1675
## sd(evidence) 1.04 0.05 0.95 1.14 1.00 1492
## cor(Intercept,evidence) 0.50 0.04 0.42 0.58 1.00 1609
## Tail_ESS
## sd(Intercept) 2845
## sd(evidence) 2592
## cor(Intercept,evidence) 2706
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept -0.06 0.07 -0.20 0.07 1.00 1290
## evidence 1.48 0.06 1.36 1.60 1.00 1910
## meansTRUE -0.17 0.05 -0.27 -0.07 1.00 5682
## sd_diff15 1.16 0.06 1.05 1.27 1.00 5670
## evidence:meansTRUE -0.04 0.05 -0.13 0.05 1.00 5560
## evidence:sd_diff15 0.54 0.05 0.43 0.64 1.00 4891
## meansTRUE:sd_diff15 0.27 0.08 0.12 0.42 1.00 5093
## evidence:meansTRUE:sd_diff15 -0.17 0.07 -0.31 -0.02 1.00 4708
## Tail_ESS
## Intercept 2195
## evidence 3237
## meansTRUE 3922
## sd_diff15 4126
## evidence:meansTRUE 3972
## evidence:sd_diff15 4067
## meansTRUE:sd_diff15 3732
## evidence:meansTRUE:sd_diff15 3746
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Let’s take a look at the estimated psychometric functions for each worker.
model_check_df %>%
group_by(evidence, worker_id, means, sd_diff) %>%
add_fitted_draws(m.wrkr.means.sd.logistic, value = "pf", n = 200) %>%
ggplot(aes(x = evidence, y = intervene, color = condition, fill = condition)) +
geom_vline(xintercept = 0, size = 1, alpha = .3, color = "red", linetype = "dashed") + # utility optimal decision rule
stat_lineribbon(aes(y = pf), .width = c(.95), alpha = .25) +
geom_point(alpha = .15) +
scale_fill_brewer(type = "qual", palette = 1) +
scale_color_brewer(type = "qual", palette = 1) +
coord_cartesian(xlim = quantile(model_df$evidence, c(0, 1)),
ylim = quantile(model_df$intervene, c(0, 1))) +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_wrap(. ~ worker_id)
It looks like there may be some differences per visualization condition that our model is not capturing.
What do the posteriors for just-noticable differences (JND) and points of subjective equality (PSE) look like when means are present vs absent at diffent levels of uncertainty?
# get slopes from linear model
slopes_df <- model_df %>%
group_by(means, sd_diff) %>%
data_grid(evidence = c(0, 1)) %>%
add_fitted_draws(m.wrkr.means.sd.logistic, re_formula = NA, scale = "linear") %>%
compare_levels(.value, by = evidence) %>%
rename(slope = .value)
# get intercepts from linear model
intercepts_df <- model_df %>%
group_by(means, sd_diff) %>%
data_grid(evidence = 0) %>%
add_fitted_draws(m.wrkr.means.sd.logistic, re_formula = NA, scale = "linear") %>%
rename(intercept = .value)
# join dataframes for slopes and intercepts, calculate PSE and JND
stats_df <- slopes_df %>%
full_join(intercepts_df, by = c("means", "sd_diff", ".draw")) %>%
mutate(
pse = -intercept / slope,
jnd = qlogis(0.75) / slope
)
First, let’s look at the estimates of JNDs per condition.
stats_df %>%
# compare_levels(jnd, by = means) %>%
# mutate(jnd_p_award = exp(jnd) / (1 / (unique(model_df$baseline) + 1 / unique(model_df$award_value)) - 1 + exp(jnd)) - unique(model_df$baseline) - 1 / unique(model_df$award_value)) %>%
ggplot(aes(x = jnd, group = means, color = means, fill = means)) +
geom_density(alpha = 0.35) +
# ggplot(aes(x = jnd_p_award)) +
# geom_density(alpha = 0.35, fill = "black") +
scale_x_continuous(expression(jnd), expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior JND per condition") +
# labs(subtitle = "Posterior difference in JNDs with means present vs absent") +
theme(panel.grid = element_blank()) +
facet_grid(. ~ sd_diff)
It looks like means improve sensitivity to evidence when mean are present and uncertainty is high, ignoring the effect of other manipulations. This is consistent with the debiasing effect of the mean on effect size judgments when uncertainty is high.
Next, we’ll look at the point of subjective equality in each condition.
stats_df %>%
ggplot(aes(x = pse, group = means, color = means, fill = means)) +
geom_density(alpha = 0.35) +
scale_x_continuous(expression(pse), expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior PSE per condition") +
theme(panel.grid = element_blank()) +
facet_grid(. ~ sd_diff)
It looks like the point of subjective equality slightly biased when means are present and uncertainty is low. However, when uncertainty is high, users are less biased without means and tend to lean away from intervention when means are present. This is ignoring the impact of other manipulations.
We also want to know whether different visualizations conditions bias the point of subjective equality or change sensitivity to utility on average. Just like we did for the effect of extrinsic means, we model these effects as a fixed intercept and a fixed slope (i.e., an interaction with the level of evidence). This is the first model on our decision data that includes all the predictors which are required to answer our research questions (i.e., the minimal model on our path of model expansion).
We’ll use the same priors as we did for our previous model. Now, let’s fit our model.
# hierarchical linear model with logit link and predictors per visualization condition
m.m.logistic <- brm(data = model_df, family = bernoulli(link = "logit"),
formula = bf(intervene ~ (1 + evidence|worker_id) + evidence*means*sd_diff*condition),
prior = c(prior(normal(0, 1), class = Intercept),
prior(normal(1, 1), class = b, coef = evidence),
prior(normal(0, 0.5), class = b),
prior(normal(0, 0.5), class = sd),
prior(lkj(4), class = cor)),
iter = 5000, warmup = 2000, chains = 2, cores = 2, thin = 2,
file = "model-fits/logistic_mdl-minimal")
Check diagnostics:
# trace plots
plot(m.m.logistic)
# model summary
print(m.m.logistic)
## Family: bernoulli
## Links: mu = logit
## Formula: intervene ~ (1 + evidence | worker_id) + evidence * means * sd_diff * condition
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 5000; warmup = 2000; thin = 2;
## total post-warmup samples = 3000
##
## Group-Level Effects:
## ~worker_id (Number of levels: 623)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept) 1.41 0.05 1.31 1.52 1.00 1611
## sd(evidence) 1.02 0.05 0.93 1.13 1.00 1432
## cor(Intercept,evidence) 0.50 0.04 0.41 0.58 1.00 1472
## Tail_ESS
## sd(Intercept) 1997
## sd(evidence) 2210
## cor(Intercept,evidence) 2162
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI
## Intercept 0.04 0.13 -0.21
## evidence 1.53 0.10 1.34
## meansTRUE -0.14 0.09 -0.32
## sd_diff15 0.97 0.10 0.79
## conditionHOPs -0.35 0.17 -0.71
## conditionintervals -0.37 0.17 -0.70
## conditionQDPs 0.32 0.18 -0.03
## evidence:meansTRUE -0.09 0.08 -0.26
## evidence:sd_diff15 0.37 0.09 0.19
## meansTRUE:sd_diff15 0.29 0.13 0.04
## evidence:conditionHOPs -0.27 0.14 -0.54
## evidence:conditionintervals -0.14 0.14 -0.42
## evidence:conditionQDPs 0.21 0.15 -0.08
## meansTRUE:conditionHOPs 0.10 0.13 -0.17
## meansTRUE:conditionintervals -0.01 0.14 -0.28
## meansTRUE:conditionQDPs -0.22 0.14 -0.49
## sd_diff15:conditionHOPs 0.53 0.14 0.25
## sd_diff15:conditionintervals 0.25 0.14 -0.02
## sd_diff15:conditionQDPs -0.02 0.14 -0.30
## evidence:meansTRUE:sd_diff15 -0.07 0.12 -0.31
## evidence:meansTRUE:conditionHOPs -0.03 0.12 -0.25
## evidence:meansTRUE:conditionintervals 0.14 0.12 -0.09
## evidence:meansTRUE:conditionQDPs 0.09 0.13 -0.15
## evidence:sd_diff15:conditionHOPs 0.22 0.14 -0.04
## evidence:sd_diff15:conditionintervals 0.30 0.13 0.04
## evidence:sd_diff15:conditionQDPs 0.16 0.14 -0.12
## meansTRUE:sd_diff15:conditionHOPs -0.41 0.19 -0.77
## meansTRUE:sd_diff15:conditionintervals 0.27 0.19 -0.12
## meansTRUE:sd_diff15:conditionQDPs 0.12 0.19 -0.24
## evidence:meansTRUE:sd_diff15:conditionHOPs -0.35 0.17 -0.69
## evidence:meansTRUE:sd_diff15:conditionintervals 0.14 0.18 -0.24
## evidence:meansTRUE:sd_diff15:conditionQDPs -0.04 0.19 -0.41
## u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.28 1.00 903 1762
## evidence 1.74 1.00 1215 2302
## meansTRUE 0.05 1.00 1728 2292
## sd_diff15 1.16 1.00 1656 2408
## conditionHOPs -0.01 1.00 1007 1779
## conditionintervals -0.04 1.00 1133 2003
## conditionQDPs 0.65 1.00 1047 1667
## evidence:meansTRUE 0.07 1.00 1852 2621
## evidence:sd_diff15 0.55 1.00 1788 2453
## meansTRUE:sd_diff15 0.54 1.00 1694 2310
## evidence:conditionHOPs 0.01 1.00 1340 2242
## evidence:conditionintervals 0.13 1.00 1405 1827
## evidence:conditionQDPs 0.50 1.00 1395 2042
## meansTRUE:conditionHOPs 0.35 1.00 1921 2510
## meansTRUE:conditionintervals 0.25 1.00 2037 2650
## meansTRUE:conditionQDPs 0.05 1.00 1857 2395
## sd_diff15:conditionHOPs 0.80 1.00 1874 2218
## sd_diff15:conditionintervals 0.52 1.00 1894 2481
## sd_diff15:conditionQDPs 0.26 1.00 2120 2320
## evidence:meansTRUE:sd_diff15 0.18 1.00 1813 2480
## evidence:meansTRUE:conditionHOPs 0.21 1.00 2150 2281
## evidence:meansTRUE:conditionintervals 0.38 1.00 2093 2348
## evidence:meansTRUE:conditionQDPs 0.33 1.00 1692 2530
## evidence:sd_diff15:conditionHOPs 0.49 1.00 2017 2422
## evidence:sd_diff15:conditionintervals 0.57 1.00 2172 2411
## evidence:sd_diff15:conditionQDPs 0.44 1.00 2141 2748
## meansTRUE:sd_diff15:conditionHOPs -0.05 1.00 2000 2288
## meansTRUE:sd_diff15:conditionintervals 0.64 1.00 2087 2481
## meansTRUE:sd_diff15:conditionQDPs 0.49 1.00 1919 2149
## evidence:meansTRUE:sd_diff15:conditionHOPs -0.01 1.00 2089 2644
## evidence:meansTRUE:sd_diff15:conditionintervals 0.51 1.00 2097 2655
## evidence:meansTRUE:sd_diff15:conditionQDPs 0.32 1.00 2024 2427
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Let’s take a look at the estimated psychometric functions for each worker.
model_check_df %>%
group_by(evidence, worker_id, means, sd_diff, condition) %>%
add_fitted_draws(m.m.logistic, value = "pf", n = 500) %>%
ggplot(aes(x = evidence, y = intervene, color = condition, fill = condition)) +
geom_vline(xintercept = 0, size = 1, alpha = .3, color = "red", linetype = "dashed") + # utility optimal decision rule
stat_lineribbon(aes(y = pf), .width = c(.95), alpha = .25) +
geom_point(alpha = .15) +
scale_fill_brewer(type = "qual", palette = 1) +
scale_color_brewer(type = "qual", palette = 1) +
coord_cartesian(xlim = quantile(model_df$evidence, c(0, 1)),
ylim = quantile(model_df$intervene, c(0, 1))) +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_wrap(. ~ worker_id)
What do the posteriors for just-noticable differences (JND) and points of subjective equality (PSE) look like? We’ll start by getting the slopes and intercepts of the linear model and using these to derive estimates of the JND and PSE for each fitted draw.
# get slopes from linear model
slopes_df <- model_df %>%
group_by(means, sd_diff, condition) %>%
data_grid(evidence = c(0, 1)) %>%
add_fitted_draws(m.m.logistic, re_formula = NA, scale = "linear") %>%
compare_levels(.value, by = evidence) %>%
rename(slope = .value)
# get intercepts from linear model
intercepts_df <- model_df %>%
group_by(means, sd_diff, condition) %>%
data_grid(evidence = 0) %>%
add_fitted_draws(m.m.logistic, re_formula = NA, scale = "linear") %>%
rename(intercept = .value)
# join dataframes for slopes and intercepts, calculate PSE and JND
stats_df <- slopes_df %>%
full_join(intercepts_df, by = c("means", "sd_diff", "condition", ".draw")) %>%
mutate(
pse = -intercept / slope,
jnd = qlogis(0.75) / slope
)
First, let’s look at the estimates of JNDs per visualization, marginalizing across other manipulations.
stats_df %>%
group_by(condition, .draw) %>% # maginalize out other manipulations
summarise(jnd = weighted.mean(jnd)) %>%
ggplot(aes(x = jnd, group = condition, color = condition, fill = condition)) +
geom_density(alpha = 0.35) +
scale_fill_brewer(type = "qual", palette = 2) +
scale_color_brewer(type = "qual", palette = 2) +
scale_x_continuous(expression(jnd), expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior JND per visualization") +
theme(panel.grid = element_blank())
It looks like users most sensitive to evidence (i.e., JNDs are smaller) with quantile dotplots and least sensitive with HOPs.
What if we look at the full interaction of extrinsic means, level of uncertainty, and visualization condition on JNDs?
stats_df %>%
ggplot(aes(x = jnd, group = means, color = means, fill = means)) +
geom_density(alpha = 0.35) +
scale_x_continuous(expression(jnd), expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior JND per condition") +
theme(panel.grid = element_blank()) +
facet_grid(condition ~ sd_diff)
It seems like means improve sensitivity to evidence a little bit for intervals but reduce sensitivity for every other visualization condition, especially HOPs when uncertainty is high.
Next, we’ll look at the point of subjective equality for each visualization condition, marginalizing out other manipulations.
stats_df %>%
group_by(condition, .draw) %>% # maginalize out other manipulations
summarise(pse = weighted.mean(pse)) %>%
ggplot(aes(x = pse, group = condition, color = condition, fill = condition)) +
geom_density(alpha = 0.35) +
scale_fill_brewer(type = "qual", palette = 2) +
scale_color_brewer(type = "qual", palette = 2) +
scale_x_continuous(expression(pse), expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior PSE per visualization") +
theme(panel.grid = element_blank())
It looks like users are biased towars intervening in all conditions but least so with intervals.
What if we look at the full interaction of extrinsic means, level of uncertainty, and visualization condition on PSE?
stats_df %>%
ggplot(aes(x = pse, group = means, color = means, fill = means)) +
geom_density(alpha = 0.35) +
scale_x_continuous(expression(pse), expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior PSE per condition") +
theme(panel.grid = element_blank()) +
facet_grid(condition ~ sd_diff)
In all conditions, adding means biases people toward intervening when uncertainty is low and away from intervening when uncertainty is high. This only seems to result in more optimal decisions for densities and quantile dotplots when uncertainty is low.
In the minimal model to answer our research questions above, estimates for the effect of means are noisier than we would like. We’ll try to better account for heterogeneity across subjects by adding more random effects to our model for each within subjects manipulation.
Following a principle of model expansion, we will make this changes cumulatively. We include a series of model specifications that capture plausible structure in the data and fit without any sampling issues.
This model adds random effects for the presence/absence of the mean and the level of uncertainty. This tells our model that each individual can respond differently to these factors, which may help to parse heterogeneity in responses. Note that we are unable to identify the random effect of evidence, means, and level of uncertainty because we have only one observation per unique combintation of these variables per worker. This is why the random effects formula does not match the fixed effects formula.
m.m.logistic.r_means.sd <- brm(
data = model_df, family = bernoulli(link = "logit"),
formula = bf(intervene ~ (1 + evidence + means*sd_diff|worker_id) + evidence*means*sd_diff*condition),
prior = c(prior(normal(0, 1), class = Intercept),
prior(normal(1, 1), class = b, coef = evidence),
prior(normal(0, 0.5), class = b),
prior(normal(0, 0.5), class = sd),
prior(lkj(4), class = cor)),
iter = 5000, warmup = 2000, chains = 2, cores = 2, thin = 2,
file = "model-fits/logistic_mdl-min-r_means_sd")
# trace plots
plot(m.m.logistic.r_means.sd)
summary(m.m.logistic.r_means.sd)
## Family: bernoulli
## Links: mu = logit
## Formula: intervene ~ (1 + evidence + means * sd_diff | worker_id) + evidence * means * sd_diff * condition
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 5000; warmup = 2000; thin = 2;
## total post-warmup samples = 3000
##
## Group-Level Effects:
## ~worker_id (Number of levels: 623)
## Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(Intercept) 1.93 0.08 1.78 2.09 1.00
## sd(evidence) 1.21 0.06 1.10 1.32 1.00
## sd(meansTRUE) 1.61 0.09 1.44 1.80 1.00
## sd(sd_diff15) 1.11 0.08 0.95 1.28 1.00
## sd(meansTRUE:sd_diff15) 0.66 0.14 0.39 0.93 1.00
## cor(Intercept,evidence) 0.31 0.05 0.21 0.41 1.00
## cor(Intercept,meansTRUE) -0.32 0.06 -0.44 -0.20 1.00
## cor(evidence,meansTRUE) 0.06 0.06 -0.07 0.18 1.00
## cor(Intercept,sd_diff15) -0.50 0.07 -0.62 -0.37 1.00
## cor(evidence,sd_diff15) 0.17 0.08 0.02 0.32 1.00
## cor(meansTRUE,sd_diff15) 0.29 0.08 0.13 0.44 1.00
## cor(Intercept,meansTRUE:sd_diff15) 0.04 0.16 -0.27 0.36 1.00
## cor(evidence,meansTRUE:sd_diff15) 0.34 0.14 0.07 0.62 1.00
## cor(meansTRUE,meansTRUE:sd_diff15) -0.55 0.12 -0.77 -0.29 1.00
## cor(sd_diff15,meansTRUE:sd_diff15) -0.15 0.17 -0.45 0.20 1.00
## Bulk_ESS Tail_ESS
## sd(Intercept) 2051 2478
## sd(evidence) 1736 2634
## sd(meansTRUE) 2124 2525
## sd(sd_diff15) 2011 2538
## sd(meansTRUE:sd_diff15) 1708 2448
## cor(Intercept,evidence) 1757 2473
## cor(Intercept,meansTRUE) 2383 2826
## cor(evidence,meansTRUE) 2341 2518
## cor(Intercept,sd_diff15) 2654 2732
## cor(evidence,sd_diff15) 2427 2512
## cor(meansTRUE,sd_diff15) 2196 2341
## cor(Intercept,meansTRUE:sd_diff15) 2545 2541
## cor(evidence,meansTRUE:sd_diff15) 2126 2469
## cor(meansTRUE,meansTRUE:sd_diff15) 2756 2300
## cor(sd_diff15,meansTRUE:sd_diff15) 2225 2706
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI
## Intercept -0.01 0.15 -0.31
## evidence 1.77 0.12 1.54
## meansTRUE -0.18 0.14 -0.46
## sd_diff15 1.14 0.12 0.90
## conditionHOPs -0.42 0.21 -0.81
## conditionintervals -0.42 0.21 -0.84
## conditionQDPs 0.31 0.21 -0.10
## evidence:meansTRUE -0.07 0.10 -0.26
## evidence:sd_diff15 0.36 0.11 0.16
## meansTRUE:sd_diff15 0.43 0.15 0.15
## evidence:conditionHOPs -0.29 0.16 -0.60
## evidence:conditionintervals -0.16 0.16 -0.47
## evidence:conditionQDPs 0.22 0.17 -0.11
## meansTRUE:conditionHOPs 0.13 0.20 -0.27
## meansTRUE:conditionintervals -0.01 0.20 -0.42
## meansTRUE:conditionQDPs -0.19 0.20 -0.58
## sd_diff15:conditionHOPs 0.59 0.18 0.23
## sd_diff15:conditionintervals 0.28 0.17 -0.05
## sd_diff15:conditionQDPs 0.01 0.18 -0.33
## evidence:meansTRUE:sd_diff15 -0.01 0.14 -0.27
## evidence:meansTRUE:conditionHOPs -0.02 0.13 -0.26
## evidence:meansTRUE:conditionintervals 0.17 0.13 -0.08
## evidence:meansTRUE:conditionQDPs 0.16 0.14 -0.13
## evidence:sd_diff15:conditionHOPs 0.20 0.14 -0.08
## evidence:sd_diff15:conditionintervals 0.30 0.15 0.01
## evidence:sd_diff15:conditionQDPs 0.19 0.15 -0.12
## meansTRUE:sd_diff15:conditionHOPs -0.44 0.21 -0.83
## meansTRUE:sd_diff15:conditionintervals 0.31 0.21 -0.09
## meansTRUE:sd_diff15:conditionQDPs 0.17 0.21 -0.24
## evidence:meansTRUE:sd_diff15:conditionHOPs -0.33 0.18 -0.69
## evidence:meansTRUE:sd_diff15:conditionintervals 0.16 0.20 -0.22
## evidence:meansTRUE:sd_diff15:conditionQDPs -0.04 0.20 -0.42
## u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.27 1.00 2723 2789
## evidence 2.00 1.00 2379 2785
## meansTRUE 0.09 1.00 2905 2613
## sd_diff15 1.38 1.00 2862 2667
## conditionHOPs 0.00 1.00 2743 2515
## conditionintervals -0.02 1.00 2385 2824
## conditionQDPs 0.73 1.00 2775 2730
## evidence:meansTRUE 0.11 1.00 2848 2906
## evidence:sd_diff15 0.58 1.00 2435 2829
## meansTRUE:sd_diff15 0.71 1.00 2475 2253
## evidence:conditionHOPs 0.03 1.00 2391 2477
## evidence:conditionintervals 0.17 1.00 2197 2718
## evidence:conditionQDPs 0.55 1.00 2509 2941
## meansTRUE:conditionHOPs 0.51 1.00 2729 2945
## meansTRUE:conditionintervals 0.38 1.00 2914 2450
## meansTRUE:conditionQDPs 0.22 1.00 2786 2417
## sd_diff15:conditionHOPs 0.93 1.00 2634 2746
## sd_diff15:conditionintervals 0.62 1.00 2661 2714
## sd_diff15:conditionQDPs 0.36 1.00 2878 2866
## evidence:meansTRUE:sd_diff15 0.26 1.00 2705 2539
## evidence:meansTRUE:conditionHOPs 0.23 1.00 2286 2519
## evidence:meansTRUE:conditionintervals 0.43 1.00 2517 2637
## evidence:meansTRUE:conditionQDPs 0.43 1.00 2750 2823
## evidence:sd_diff15:conditionHOPs 0.48 1.00 2180 2533
## evidence:sd_diff15:conditionintervals 0.59 1.00 2014 2338
## evidence:sd_diff15:conditionQDPs 0.49 1.00 2213 2650
## meansTRUE:sd_diff15:conditionHOPs -0.03 1.00 2395 2726
## meansTRUE:sd_diff15:conditionintervals 0.72 1.00 2480 2344
## meansTRUE:sd_diff15:conditionQDPs 0.57 1.00 2559 2362
## evidence:meansTRUE:sd_diff15:conditionHOPs 0.04 1.00 2439 2471
## evidence:meansTRUE:sd_diff15:conditionintervals 0.55 1.00 2313 2520
## evidence:meansTRUE:sd_diff15:conditionQDPs 0.34 1.00 2341 2671
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
This model adds mixed effects of trial order on JNDs and PSEs. This functionally adds a learning component to our model of the latent sense of utility on which decisions are based.
# hierarchical logistic regression
m.m.logistic.r_means.sd.trial <- brm(
data = model_df, family = bernoulli(link = "logit"),
formula = bf(intervene ~ (1 + evidence*means*sd_diff + evidence*trial|worker_id) + evidence*means*sd_diff*condition + evidence*condition*trial),
prior = c(prior(normal(0, 1), class = Intercept),
prior(normal(1, 1), class = b, coef = evidence),
prior(normal(0, 0.5), class = b),
prior(normal(0, 0.5), class = sd),
prior(lkj(4), class = cor)),
iter = 5000, warmup = 2000, chains = 2, cores = 2, thin = 2,
file = "model-fits/logistic_mdl-min-r_means_sd_trial")
# trace plots
plot(m.m.logistic.r_means.sd.trial)
summary(m.m.logistic.r_means.sd.trial)
## Family: bernoulli
## Links: mu = logit
## Formula: intervene ~ (1 + evidence * means * sd_diff + evidence * trial | worker_id) + evidence * means * sd_diff * condition + evidence * condition * trial
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 5000; warmup = 2000; thin = 2;
## total post-warmup samples = 3000
##
## Group-Level Effects:
## ~worker_id (Number of levels: 623)
## Estimate Est.Error
## sd(Intercept) 1.90 0.09
## sd(evidence) 1.27 0.08
## sd(meansTRUE) 1.33 0.11
## sd(sd_diff15) 1.18 0.09
## sd(trial) 2.47 0.16
## sd(evidence:meansTRUE) 0.79 0.11
## sd(evidence:sd_diff15) 0.74 0.10
## sd(meansTRUE:sd_diff15) 0.76 0.16
## sd(evidence:trial) 1.49 0.16
## sd(evidence:meansTRUE:sd_diff15) 0.70 0.19
## cor(Intercept,evidence) 0.56 0.05
## cor(Intercept,meansTRUE) -0.15 0.09
## cor(evidence,meansTRUE) 0.01 0.09
## cor(Intercept,sd_diff15) -0.40 0.07
## cor(evidence,sd_diff15) -0.07 0.09
## cor(meansTRUE,sd_diff15) 0.26 0.09
## cor(Intercept,trial) 0.36 0.07
## cor(evidence,trial) 0.13 0.08
## cor(meansTRUE,trial) 0.20 0.08
## cor(sd_diff15,trial) -0.07 0.09
## cor(Intercept,evidence:meansTRUE) -0.21 0.11
## cor(evidence,evidence:meansTRUE) -0.10 0.13
## cor(meansTRUE,evidence:meansTRUE) 0.45 0.12
## cor(sd_diff15,evidence:meansTRUE) 0.32 0.12
## cor(trial,evidence:meansTRUE) 0.14 0.12
## cor(Intercept,evidence:sd_diff15) -0.37 0.11
## cor(evidence,evidence:sd_diff15) -0.09 0.12
## cor(meansTRUE,evidence:sd_diff15) -0.07 0.13
## cor(sd_diff15,evidence:sd_diff15) 0.65 0.09
## cor(trial,evidence:sd_diff15) -0.13 0.12
## cor(evidence:meansTRUE,evidence:sd_diff15) 0.20 0.15
## cor(Intercept,meansTRUE:sd_diff15) -0.06 0.15
## cor(evidence,meansTRUE:sd_diff15) 0.32 0.14
## cor(meansTRUE,meansTRUE:sd_diff15) -0.15 0.16
## cor(sd_diff15,meansTRUE:sd_diff15) 0.06 0.16
## cor(trial,meansTRUE:sd_diff15) -0.03 0.15
## cor(evidence:meansTRUE,meansTRUE:sd_diff15) 0.17 0.18
## cor(evidence:sd_diff15,meansTRUE:sd_diff15) 0.22 0.17
## cor(Intercept,evidence:trial) 0.28 0.09
## cor(evidence,evidence:trial) 0.41 0.09
## cor(meansTRUE,evidence:trial) 0.02 0.12
## cor(sd_diff15,evidence:trial) -0.20 0.11
## cor(trial,evidence:trial) 0.50 0.09
## cor(evidence:meansTRUE,evidence:trial) 0.19 0.13
## cor(evidence:sd_diff15,evidence:trial) -0.13 0.13
## cor(meansTRUE:sd_diff15,evidence:trial) 0.16 0.16
## cor(Intercept,evidence:meansTRUE:sd_diff15) -0.23 0.14
## cor(evidence,evidence:meansTRUE:sd_diff15) 0.08 0.16
## cor(meansTRUE,evidence:meansTRUE:sd_diff15) 0.05 0.17
## cor(sd_diff15,evidence:meansTRUE:sd_diff15) 0.09 0.17
## cor(trial,evidence:meansTRUE:sd_diff15) -0.04 0.16
## cor(evidence:meansTRUE,evidence:meansTRUE:sd_diff15) 0.02 0.19
## cor(evidence:sd_diff15,evidence:meansTRUE:sd_diff15) 0.09 0.19
## cor(meansTRUE:sd_diff15,evidence:meansTRUE:sd_diff15) 0.50 0.17
## cor(evidence:trial,evidence:meansTRUE:sd_diff15) -0.05 0.18
## l-95% CI u-95% CI Rhat
## sd(Intercept) 1.73 2.08 1.00
## sd(evidence) 1.12 1.42 1.00
## sd(meansTRUE) 1.10 1.55 1.01
## sd(sd_diff15) 1.01 1.36 1.00
## sd(trial) 2.17 2.79 1.00
## sd(evidence:meansTRUE) 0.56 1.01 1.00
## sd(evidence:sd_diff15) 0.56 0.95 1.00
## sd(meansTRUE:sd_diff15) 0.44 1.07 1.00
## sd(evidence:trial) 1.18 1.80 1.00
## sd(evidence:meansTRUE:sd_diff15) 0.32 1.04 1.00
## cor(Intercept,evidence) 0.46 0.66 1.00
## cor(Intercept,meansTRUE) -0.31 0.02 1.00
## cor(evidence,meansTRUE) -0.16 0.18 1.00
## cor(Intercept,sd_diff15) -0.54 -0.25 1.00
## cor(evidence,sd_diff15) -0.24 0.10 1.00
## cor(meansTRUE,sd_diff15) 0.07 0.44 1.00
## cor(Intercept,trial) 0.23 0.49 1.00
## cor(evidence,trial) -0.01 0.28 1.00
## cor(meansTRUE,trial) 0.03 0.35 1.00
## cor(sd_diff15,trial) -0.23 0.12 1.00
## cor(Intercept,evidence:meansTRUE) -0.43 -0.00 1.00
## cor(evidence,evidence:meansTRUE) -0.34 0.16 1.00
## cor(meansTRUE,evidence:meansTRUE) 0.21 0.65 1.00
## cor(sd_diff15,evidence:meansTRUE) 0.09 0.55 1.00
## cor(trial,evidence:meansTRUE) -0.10 0.36 1.00
## cor(Intercept,evidence:sd_diff15) -0.57 -0.16 1.00
## cor(evidence,evidence:sd_diff15) -0.32 0.15 1.00
## cor(meansTRUE,evidence:sd_diff15) -0.32 0.19 1.00
## cor(sd_diff15,evidence:sd_diff15) 0.46 0.82 1.00
## cor(trial,evidence:sd_diff15) -0.36 0.10 1.00
## cor(evidence:meansTRUE,evidence:sd_diff15) -0.09 0.48 1.00
## cor(Intercept,meansTRUE:sd_diff15) -0.36 0.23 1.00
## cor(evidence,meansTRUE:sd_diff15) 0.04 0.58 1.00
## cor(meansTRUE,meansTRUE:sd_diff15) -0.45 0.18 1.00
## cor(sd_diff15,meansTRUE:sd_diff15) -0.25 0.39 1.00
## cor(trial,meansTRUE:sd_diff15) -0.32 0.26 1.00
## cor(evidence:meansTRUE,meansTRUE:sd_diff15) -0.19 0.50 1.00
## cor(evidence:sd_diff15,meansTRUE:sd_diff15) -0.12 0.55 1.00
## cor(Intercept,evidence:trial) 0.10 0.46 1.00
## cor(evidence,evidence:trial) 0.22 0.59 1.00
## cor(meansTRUE,evidence:trial) -0.21 0.25 1.00
## cor(sd_diff15,evidence:trial) -0.41 0.00 1.00
## cor(trial,evidence:trial) 0.32 0.67 1.00
## cor(evidence:meansTRUE,evidence:trial) -0.06 0.44 1.00
## cor(evidence:sd_diff15,evidence:trial) -0.37 0.13 1.00
## cor(meansTRUE:sd_diff15,evidence:trial) -0.16 0.47 1.00
## cor(Intercept,evidence:meansTRUE:sd_diff15) -0.49 0.07 1.00
## cor(evidence,evidence:meansTRUE:sd_diff15) -0.23 0.38 1.00
## cor(meansTRUE,evidence:meansTRUE:sd_diff15) -0.29 0.38 1.00
## cor(sd_diff15,evidence:meansTRUE:sd_diff15) -0.24 0.43 1.00
## cor(trial,evidence:meansTRUE:sd_diff15) -0.35 0.26 1.00
## cor(evidence:meansTRUE,evidence:meansTRUE:sd_diff15) -0.35 0.41 1.00
## cor(evidence:sd_diff15,evidence:meansTRUE:sd_diff15) -0.28 0.47 1.00
## cor(meansTRUE:sd_diff15,evidence:meansTRUE:sd_diff15) 0.12 0.77 1.00
## cor(evidence:trial,evidence:meansTRUE:sd_diff15) -0.40 0.30 1.00
## Bulk_ESS Tail_ESS
## sd(Intercept) 1953 2802
## sd(evidence) 2181 2683
## sd(meansTRUE) 1106 1899
## sd(sd_diff15) 2039 2532
## sd(trial) 1645 2510
## sd(evidence:meansTRUE) 871 1786
## sd(evidence:sd_diff15) 1572 2440
## sd(meansTRUE:sd_diff15) 927 1712
## sd(evidence:trial) 1609 1883
## sd(evidence:meansTRUE:sd_diff15) 706 883
## cor(Intercept,evidence) 2036 2261
## cor(Intercept,meansTRUE) 2014 2139
## cor(evidence,meansTRUE) 1747 2203
## cor(Intercept,sd_diff15) 2189 2828
## cor(evidence,sd_diff15) 1893 2647
## cor(meansTRUE,sd_diff15) 1358 2160
## cor(Intercept,trial) 2009 2371
## cor(evidence,trial) 1558 2312
## cor(meansTRUE,trial) 1614 2369
## cor(sd_diff15,trial) 1433 1868
## cor(Intercept,evidence:meansTRUE) 2016 2450
## cor(evidence,evidence:meansTRUE) 1955 2121
## cor(meansTRUE,evidence:meansTRUE) 1271 1767
## cor(sd_diff15,evidence:meansTRUE) 1470 2139
## cor(trial,evidence:meansTRUE) 1352 2176
## cor(Intercept,evidence:sd_diff15) 2197 2269
## cor(evidence,evidence:sd_diff15) 2060 2465
## cor(meansTRUE,evidence:sd_diff15) 1646 2051
## cor(sd_diff15,evidence:sd_diff15) 1748 2598
## cor(trial,evidence:sd_diff15) 1920 2319
## cor(evidence:meansTRUE,evidence:sd_diff15) 1861 2017
## cor(Intercept,meansTRUE:sd_diff15) 2086 2464
## cor(evidence,meansTRUE:sd_diff15) 1892 2626
## cor(meansTRUE,meansTRUE:sd_diff15) 1857 2041
## cor(sd_diff15,meansTRUE:sd_diff15) 1740 2736
## cor(trial,meansTRUE:sd_diff15) 1957 2502
## cor(evidence:meansTRUE,meansTRUE:sd_diff15) 1206 1549
## cor(evidence:sd_diff15,meansTRUE:sd_diff15) 1439 1933
## cor(Intercept,evidence:trial) 2211 2331
## cor(evidence,evidence:trial) 1942 2464
## cor(meansTRUE,evidence:trial) 1392 2021
## cor(sd_diff15,evidence:trial) 1457 2420
## cor(trial,evidence:trial) 1564 2245
## cor(evidence:meansTRUE,evidence:trial) 1606 2290
## cor(evidence:sd_diff15,evidence:trial) 1512 2147
## cor(meansTRUE:sd_diff15,evidence:trial) 1243 1936
## cor(Intercept,evidence:meansTRUE:sd_diff15) 1924 2517
## cor(evidence,evidence:meansTRUE:sd_diff15) 2053 2015
## cor(meansTRUE,evidence:meansTRUE:sd_diff15) 2213 2662
## cor(sd_diff15,evidence:meansTRUE:sd_diff15) 2065 2448
## cor(trial,evidence:meansTRUE:sd_diff15) 2134 1895
## cor(evidence:meansTRUE,evidence:meansTRUE:sd_diff15) 1602 2522
## cor(evidence:sd_diff15,evidence:meansTRUE:sd_diff15) 1665 2435
## cor(meansTRUE:sd_diff15,evidence:meansTRUE:sd_diff15) 1421 1904
## cor(evidence:trial,evidence:meansTRUE:sd_diff15) 2113 2823
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI
## Intercept 0.08 0.15 -0.21
## evidence 1.92 0.13 1.67
## meansTRUE -0.19 0.15 -0.48
## sd_diff15 1.27 0.14 1.00
## conditionHOPs -0.38 0.21 -0.78
## conditionintervals -0.43 0.21 -0.84
## conditionQDPs 0.41 0.22 -0.01
## trial 1.07 0.21 0.66
## evidence:meansTRUE -0.01 0.14 -0.29
## evidence:sd_diff15 0.67 0.14 0.39
## meansTRUE:sd_diff15 0.66 0.17 0.35
## evidence:conditionHOPs -0.33 0.17 -0.65
## evidence:conditionintervals -0.14 0.17 -0.48
## evidence:conditionQDPs 0.29 0.17 -0.05
## meansTRUE:conditionHOPs 0.07 0.21 -0.34
## meansTRUE:conditionintervals 0.01 0.21 -0.41
## meansTRUE:conditionQDPs -0.31 0.21 -0.72
## sd_diff15:conditionHOPs 0.54 0.19 0.18
## sd_diff15:conditionintervals 0.24 0.19 -0.12
## sd_diff15:conditionQDPs 0.01 0.20 -0.37
## evidence:trial 1.39 0.20 1.01
## conditionHOPs:trial -0.09 0.29 -0.65
## conditionintervals:trial 0.61 0.29 0.06
## conditionQDPs:trial 0.34 0.30 -0.25
## evidence:meansTRUE:sd_diff15 0.25 0.19 -0.12
## evidence:meansTRUE:conditionHOPs -0.01 0.17 -0.36
## evidence:meansTRUE:conditionintervals 0.21 0.18 -0.16
## evidence:meansTRUE:conditionQDPs 0.12 0.19 -0.26
## evidence:sd_diff15:conditionHOPs 0.17 0.18 -0.16
## evidence:sd_diff15:conditionintervals 0.22 0.18 -0.14
## evidence:sd_diff15:conditionQDPs 0.23 0.19 -0.15
## meansTRUE:sd_diff15:conditionHOPs -0.39 0.22 -0.82
## meansTRUE:sd_diff15:conditionintervals 0.41 0.23 -0.01
## meansTRUE:sd_diff15:conditionQDPs 0.17 0.23 -0.29
## evidence:conditionHOPs:trial -0.72 0.25 -1.21
## evidence:conditionintervals:trial 0.62 0.25 0.12
## evidence:conditionQDPs:trial 0.30 0.27 -0.24
## evidence:meansTRUE:sd_diff15:conditionHOPs -0.36 0.22 -0.79
## evidence:meansTRUE:sd_diff15:conditionintervals 0.34 0.24 -0.12
## evidence:meansTRUE:sd_diff15:conditionQDPs -0.06 0.24 -0.54
## u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.38 1.00 1994 2394
## evidence 2.18 1.00 2170 2574
## meansTRUE 0.09 1.00 2230 2473
## sd_diff15 1.54 1.00 2575 2866
## conditionHOPs 0.04 1.00 2054 2223
## conditionintervals -0.02 1.00 2339 2715
## conditionQDPs 0.82 1.00 2083 2495
## trial 1.46 1.00 2573 2647
## evidence:meansTRUE 0.27 1.00 2476 2723
## evidence:sd_diff15 0.95 1.00 2048 2108
## meansTRUE:sd_diff15 1.00 1.00 2710 2791
## evidence:conditionHOPs 0.01 1.00 2088 2580
## evidence:conditionintervals 0.18 1.00 2193 2310
## evidence:conditionQDPs 0.63 1.00 2422 2680
## meansTRUE:conditionHOPs 0.49 1.00 2854 2674
## meansTRUE:conditionintervals 0.42 1.00 2613 2653
## meansTRUE:conditionQDPs 0.10 1.00 2461 2370
## sd_diff15:conditionHOPs 0.92 1.00 2433 2721
## sd_diff15:conditionintervals 0.62 1.00 2480 2768
## sd_diff15:conditionQDPs 0.40 1.00 2569 2601
## evidence:trial 1.78 1.00 2519 2663
## conditionHOPs:trial 0.49 1.00 2914 2827
## conditionintervals:trial 1.18 1.00 2832 2829
## conditionQDPs:trial 0.94 1.00 2734 2693
## evidence:meansTRUE:sd_diff15 0.62 1.00 2118 2497
## evidence:meansTRUE:conditionHOPs 0.33 1.00 2499 2599
## evidence:meansTRUE:conditionintervals 0.58 1.00 2297 2694
## evidence:meansTRUE:conditionQDPs 0.49 1.00 2684 2343
## evidence:sd_diff15:conditionHOPs 0.52 1.00 2297 2331
## evidence:sd_diff15:conditionintervals 0.57 1.00 2261 2372
## evidence:sd_diff15:conditionQDPs 0.59 1.00 2593 2505
## meansTRUE:sd_diff15:conditionHOPs 0.04 1.00 2320 2370
## meansTRUE:sd_diff15:conditionintervals 0.87 1.00 2420 2506
## meansTRUE:sd_diff15:conditionQDPs 0.61 1.00 2644 2680
## evidence:conditionHOPs:trial -0.23 1.00 2304 2530
## evidence:conditionintervals:trial 1.13 1.00 2716 2497
## evidence:conditionQDPs:trial 0.82 1.00 2818 2714
## evidence:meansTRUE:sd_diff15:conditionHOPs 0.05 1.00 2391 2266
## evidence:meansTRUE:sd_diff15:conditionintervals 0.82 1.00 2206 2597
## evidence:meansTRUE:sd_diff15:conditionQDPs 0.41 1.00 2586 2684
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Each time we add a random effect, the number of parameters multiplies. We want to make sure these parameters are contributing to the predictive validity of the model more than they risk overfitting. We’ll evaluate this by using WAIC to compare models. Whichever model has the smallest value of WAIC is the one that has the best predictive validity for the fewest parameters.
waic(m.m.logistic, m.m.logistic.r_means.sd, m.m.logistic.r_means.sd.trial)
## Output of model 'm.m.logistic':
##
## Computed from 3000 by 19924 log-likelihood matrix
##
## Estimate SE
## elpd_waic -8509.1 80.6
## p_waic 980.0 19.1
## waic 17018.2 161.2
##
## 329 (1.7%) p_waic estimates greater than 0.4. We recommend trying loo instead.
##
## Output of model 'm.m.logistic.r_means.sd':
##
## Computed from 3000 by 19924 log-likelihood matrix
##
## Estimate SE
## elpd_waic -7906.6 81.2
## p_waic 1573.5 25.7
## waic 15813.3 162.4
##
## 666 (3.3%) p_waic estimates greater than 0.4. We recommend trying loo instead.
##
## Output of model 'm.m.logistic.r_means.sd.trial':
##
## Computed from 3000 by 19924 log-likelihood matrix
##
## Estimate SE
## elpd_waic -7398.5 77.2
## p_waic 1843.9 30.2
## waic 14797.0 154.4
##
## 961 (4.8%) p_waic estimates greater than 0.4. We recommend trying loo instead.
##
## Model comparisons:
## elpd_diff se_diff
## m.m.logistic.r_means.sd.trial 0.0 0.0
## m.m.logistic.r_means.sd -508.1 31.9
## m.m.logistic -1110.6 45.5
It looks like the model with random effects predictors for means, level of uncertainty, and trial has the lowest value of WAIC, so we’ll move forward with that model.
Does it matter whether participants start the study using means? Here we add fixed effects to our model to look into this possibility.
Again, we’ll use the same priors as we did for our previous models. Now, let’s fit our model.
m.m_order.logistic.r_means.sd.trial <- brm(
data = model_df, family = bernoulli(link = "logit"),
formula = bf(intervene ~ (1 + evidence*means*sd_diff + evidence*trial|worker_id) + evidence*means*sd_diff*condition*start_means + evidence*condition*trial),
prior = c(prior(normal(0, 1), class = Intercept),
prior(normal(1, 1), class = b, coef = evidence),
prior(normal(0, 0.5), class = b),
prior(normal(0, 0.5), class = sd),
prior(lkj(4), class = cor)),
iter = 5000, warmup = 2000, chains = 2, cores = 2, thin = 2,
file = "model-fits/logistic_mdl-min_order-r_means_sd_trial")
# trace plots
plot(m.m_order.logistic.r_means.sd.trial)
summary(m.m_order.logistic.r_means.sd.trial)
## Family: bernoulli
## Links: mu = logit
## Formula: intervene ~ (1 + evidence * means * sd_diff + evidence * trial | worker_id) + evidence * means * sd_diff * condition * start_means + evidence * condition * trial
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 5000; warmup = 2000; thin = 2;
## total post-warmup samples = 3000
##
## Group-Level Effects:
## ~worker_id (Number of levels: 623)
## Estimate Est.Error
## sd(Intercept) 1.86 0.09
## sd(evidence) 1.24 0.08
## sd(meansTRUE) 1.32 0.12
## sd(sd_diff15) 1.15 0.09
## sd(trial) 2.50 0.16
## sd(evidence:meansTRUE) 0.76 0.11
## sd(evidence:sd_diff15) 0.74 0.10
## sd(meansTRUE:sd_diff15) 0.71 0.16
## sd(evidence:trial) 1.51 0.15
## sd(evidence:meansTRUE:sd_diff15) 0.64 0.19
## cor(Intercept,evidence) 0.54 0.05
## cor(Intercept,meansTRUE) -0.11 0.09
## cor(evidence,meansTRUE) 0.05 0.09
## cor(Intercept,sd_diff15) -0.37 0.08
## cor(evidence,sd_diff15) -0.03 0.09
## cor(meansTRUE,sd_diff15) 0.23 0.10
## cor(Intercept,trial) 0.37 0.07
## cor(evidence,trial) 0.14 0.08
## cor(meansTRUE,trial) 0.22 0.08
## cor(sd_diff15,trial) -0.05 0.09
## cor(Intercept,evidence:meansTRUE) -0.17 0.11
## cor(evidence,evidence:meansTRUE) -0.03 0.14
## cor(meansTRUE,evidence:meansTRUE) 0.45 0.12
## cor(sd_diff15,evidence:meansTRUE) 0.30 0.12
## cor(trial,evidence:meansTRUE) 0.16 0.12
## cor(Intercept,evidence:sd_diff15) -0.36 0.11
## cor(evidence,evidence:sd_diff15) -0.06 0.12
## cor(meansTRUE,evidence:sd_diff15) -0.09 0.13
## cor(sd_diff15,evidence:sd_diff15) 0.64 0.09
## cor(trial,evidence:sd_diff15) -0.10 0.11
## cor(evidence:meansTRUE,evidence:sd_diff15) 0.17 0.15
## cor(Intercept,meansTRUE:sd_diff15) -0.10 0.16
## cor(evidence,meansTRUE:sd_diff15) 0.31 0.15
## cor(meansTRUE,meansTRUE:sd_diff15) -0.16 0.17
## cor(sd_diff15,meansTRUE:sd_diff15) 0.06 0.17
## cor(trial,meansTRUE:sd_diff15) -0.08 0.16
## cor(evidence:meansTRUE,meansTRUE:sd_diff15) 0.15 0.18
## cor(evidence:sd_diff15,meansTRUE:sd_diff15) 0.25 0.18
## cor(Intercept,evidence:trial) 0.26 0.09
## cor(evidence,evidence:trial) 0.40 0.10
## cor(meansTRUE,evidence:trial) 0.05 0.12
## cor(sd_diff15,evidence:trial) -0.17 0.10
## cor(trial,evidence:trial) 0.50 0.09
## cor(evidence:meansTRUE,evidence:trial) 0.25 0.13
## cor(evidence:sd_diff15,evidence:trial) -0.11 0.13
## cor(meansTRUE:sd_diff15,evidence:trial) 0.13 0.17
## cor(Intercept,evidence:meansTRUE:sd_diff15) -0.21 0.15
## cor(evidence,evidence:meansTRUE:sd_diff15) 0.09 0.16
## cor(meansTRUE,evidence:meansTRUE:sd_diff15) -0.00 0.18
## cor(sd_diff15,evidence:meansTRUE:sd_diff15) 0.07 0.18
## cor(trial,evidence:meansTRUE:sd_diff15) -0.08 0.17
## cor(evidence:meansTRUE,evidence:meansTRUE:sd_diff15) -0.06 0.20
## cor(evidence:sd_diff15,evidence:meansTRUE:sd_diff15) 0.10 0.19
## cor(meansTRUE:sd_diff15,evidence:meansTRUE:sd_diff15) 0.47 0.18
## cor(evidence:trial,evidence:meansTRUE:sd_diff15) -0.10 0.18
## l-95% CI u-95% CI Rhat
## sd(Intercept) 1.70 2.04 1.00
## sd(evidence) 1.10 1.41 1.00
## sd(meansTRUE) 1.08 1.55 1.00
## sd(sd_diff15) 0.98 1.33 1.00
## sd(trial) 2.18 2.81 1.00
## sd(evidence:meansTRUE) 0.54 0.98 1.00
## sd(evidence:sd_diff15) 0.55 0.93 1.00
## sd(meansTRUE:sd_diff15) 0.40 1.01 1.00
## sd(evidence:trial) 1.21 1.80 1.00
## sd(evidence:meansTRUE:sd_diff15) 0.23 0.99 1.00
## cor(Intercept,evidence) 0.43 0.64 1.00
## cor(Intercept,meansTRUE) -0.28 0.08 1.00
## cor(evidence,meansTRUE) -0.12 0.24 1.00
## cor(Intercept,sd_diff15) -0.52 -0.21 1.00
## cor(evidence,sd_diff15) -0.21 0.14 1.00
## cor(meansTRUE,sd_diff15) 0.04 0.42 1.01
## cor(Intercept,trial) 0.24 0.49 1.00
## cor(evidence,trial) -0.01 0.28 1.00
## cor(meansTRUE,trial) 0.06 0.37 1.00
## cor(sd_diff15,trial) -0.22 0.13 1.00
## cor(Intercept,evidence:meansTRUE) -0.38 0.06 1.00
## cor(evidence,evidence:meansTRUE) -0.29 0.24 1.00
## cor(meansTRUE,evidence:meansTRUE) 0.19 0.68 1.00
## cor(sd_diff15,evidence:meansTRUE) 0.06 0.53 1.00
## cor(trial,evidence:meansTRUE) -0.07 0.40 1.00
## cor(Intercept,evidence:sd_diff15) -0.56 -0.14 1.00
## cor(evidence,evidence:sd_diff15) -0.29 0.19 1.00
## cor(meansTRUE,evidence:sd_diff15) -0.34 0.17 1.00
## cor(sd_diff15,evidence:sd_diff15) 0.45 0.81 1.00
## cor(trial,evidence:sd_diff15) -0.32 0.12 1.00
## cor(evidence:meansTRUE,evidence:sd_diff15) -0.13 0.44 1.00
## cor(Intercept,meansTRUE:sd_diff15) -0.41 0.20 1.00
## cor(evidence,meansTRUE:sd_diff15) 0.01 0.58 1.00
## cor(meansTRUE,meansTRUE:sd_diff15) -0.47 0.21 1.00
## cor(sd_diff15,meansTRUE:sd_diff15) -0.25 0.42 1.00
## cor(trial,meansTRUE:sd_diff15) -0.41 0.22 1.00
## cor(evidence:meansTRUE,meansTRUE:sd_diff15) -0.20 0.50 1.00
## cor(evidence:sd_diff15,meansTRUE:sd_diff15) -0.10 0.58 1.00
## cor(Intercept,evidence:trial) 0.08 0.44 1.00
## cor(evidence,evidence:trial) 0.21 0.59 1.00
## cor(meansTRUE,evidence:trial) -0.19 0.27 1.00
## cor(sd_diff15,evidence:trial) -0.38 0.03 1.00
## cor(trial,evidence:trial) 0.33 0.67 1.00
## cor(evidence:meansTRUE,evidence:trial) -0.01 0.49 1.00
## cor(evidence:sd_diff15,evidence:trial) -0.37 0.15 1.00
## cor(meansTRUE:sd_diff15,evidence:trial) -0.21 0.47 1.00
## cor(Intercept,evidence:meansTRUE:sd_diff15) -0.50 0.09 1.00
## cor(evidence,evidence:meansTRUE:sd_diff15) -0.23 0.40 1.00
## cor(meansTRUE,evidence:meansTRUE:sd_diff15) -0.34 0.34 1.00
## cor(sd_diff15,evidence:meansTRUE:sd_diff15) -0.29 0.43 1.00
## cor(trial,evidence:meansTRUE:sd_diff15) -0.41 0.25 1.00
## cor(evidence:meansTRUE,evidence:meansTRUE:sd_diff15) -0.41 0.35 1.00
## cor(evidence:sd_diff15,evidence:meansTRUE:sd_diff15) -0.27 0.49 1.00
## cor(meansTRUE:sd_diff15,evidence:meansTRUE:sd_diff15) 0.05 0.76 1.00
## cor(evidence:trial,evidence:meansTRUE:sd_diff15) -0.44 0.25 1.00
## Bulk_ESS Tail_ESS
## sd(Intercept) 1658 2254
## sd(evidence) 1610 2060
## sd(meansTRUE) 725 1035
## sd(sd_diff15) 1259 2076
## sd(trial) 1278 2079
## sd(evidence:meansTRUE) 602 842
## sd(evidence:sd_diff15) 949 1771
## sd(meansTRUE:sd_diff15) 728 1272
## sd(evidence:trial) 1154 1686
## sd(evidence:meansTRUE:sd_diff15) 450 450
## cor(Intercept,evidence) 1204 2126
## cor(Intercept,meansTRUE) 1315 1689
## cor(evidence,meansTRUE) 904 1477
## cor(Intercept,sd_diff15) 1558 2200
## cor(evidence,sd_diff15) 1077 1699
## cor(meansTRUE,sd_diff15) 850 1468
## cor(Intercept,trial) 1319 1761
## cor(evidence,trial) 1074 2075
## cor(meansTRUE,trial) 1116 1544
## cor(sd_diff15,trial) 972 1537
## cor(Intercept,evidence:meansTRUE) 1892 2554
## cor(evidence,evidence:meansTRUE) 1016 1895
## cor(meansTRUE,evidence:meansTRUE) 783 1485
## cor(sd_diff15,evidence:meansTRUE) 888 1609
## cor(trial,evidence:meansTRUE) 858 1516
## cor(Intercept,evidence:sd_diff15) 1570 2225
## cor(evidence,evidence:sd_diff15) 1205 1942
## cor(meansTRUE,evidence:sd_diff15) 1148 2171
## cor(sd_diff15,evidence:sd_diff15) 1147 2056
## cor(trial,evidence:sd_diff15) 1413 2331
## cor(evidence:meansTRUE,evidence:sd_diff15) 1031 2027
## cor(Intercept,meansTRUE:sd_diff15) 1922 2177
## cor(evidence,meansTRUE:sd_diff15) 1643 2163
## cor(meansTRUE,meansTRUE:sd_diff15) 1056 1585
## cor(sd_diff15,meansTRUE:sd_diff15) 1033 1711
## cor(trial,meansTRUE:sd_diff15) 1222 1941
## cor(evidence:meansTRUE,meansTRUE:sd_diff15) 1175 1764
## cor(evidence:sd_diff15,meansTRUE:sd_diff15) 1157 1837
## cor(Intercept,evidence:trial) 1630 2562
## cor(evidence,evidence:trial) 1473 2429
## cor(meansTRUE,evidence:trial) 719 1550
## cor(sd_diff15,evidence:trial) 906 1955
## cor(trial,evidence:trial) 1129 1771
## cor(evidence:meansTRUE,evidence:trial) 951 1883
## cor(evidence:sd_diff15,evidence:trial) 1016 1839
## cor(meansTRUE:sd_diff15,evidence:trial) 732 1100
## cor(Intercept,evidence:meansTRUE:sd_diff15) 2066 2232
## cor(evidence,evidence:meansTRUE:sd_diff15) 2059 2239
## cor(meansTRUE,evidence:meansTRUE:sd_diff15) 1666 2221
## cor(sd_diff15,evidence:meansTRUE:sd_diff15) 1669 2142
## cor(trial,evidence:meansTRUE:sd_diff15) 1581 2132
## cor(evidence:meansTRUE,evidence:meansTRUE:sd_diff15) 1411 1993
## cor(evidence:sd_diff15,evidence:meansTRUE:sd_diff15) 1297 2065
## cor(meansTRUE:sd_diff15,evidence:meansTRUE:sd_diff15) 845 1186
## cor(evidence:trial,evidence:meansTRUE:sd_diff15) 1405 1986
##
## Population-Level Effects:
## Estimate
## Intercept 0.32
## evidence 2.14
## meansTRUE -0.39
## sd_diff15 1.06
## conditionHOPs -0.24
## conditionintervals -0.33
## conditionQDPs 0.31
## start_meansTRUE -0.48
## trial 1.25
## evidence:meansTRUE -0.14
## evidence:sd_diff15 0.59
## meansTRUE:sd_diff15 0.64
## evidence:conditionHOPs -0.19
## evidence:conditionintervals -0.19
## evidence:conditionQDPs 0.32
## meansTRUE:conditionHOPs -0.00
## meansTRUE:conditionintervals -0.01
## meansTRUE:conditionQDPs -0.33
## sd_diff15:conditionHOPs 0.48
## sd_diff15:conditionintervals 0.37
## sd_diff15:conditionQDPs 0.10
## evidence:start_meansTRUE -0.48
## meansTRUE:start_meansTRUE 0.47
## sd_diff15:start_meansTRUE 0.51
## conditionHOPs:start_meansTRUE -0.39
## conditionintervals:start_meansTRUE -0.31
## conditionQDPs:start_meansTRUE 0.20
## evidence:trial 1.70
## conditionHOPs:trial -0.02
## conditionintervals:trial 0.65
## conditionQDPs:trial 0.33
## evidence:meansTRUE:sd_diff15 0.03
## evidence:meansTRUE:conditionHOPs -0.26
## evidence:meansTRUE:conditionintervals 0.17
## evidence:meansTRUE:conditionQDPs 0.06
## evidence:sd_diff15:conditionHOPs 0.12
## evidence:sd_diff15:conditionintervals 0.36
## evidence:sd_diff15:conditionQDPs 0.23
## meansTRUE:sd_diff15:conditionHOPs -0.54
## meansTRUE:sd_diff15:conditionintervals 0.54
## meansTRUE:sd_diff15:conditionQDPs 0.26
## evidence:meansTRUE:start_meansTRUE 0.39
## evidence:sd_diff15:start_meansTRUE 0.22
## meansTRUE:sd_diff15:start_meansTRUE -0.06
## evidence:conditionHOPs:start_meansTRUE -0.36
## evidence:conditionintervals:start_meansTRUE 0.11
## evidence:conditionQDPs:start_meansTRUE -0.11
## meansTRUE:conditionHOPs:start_meansTRUE 0.17
## meansTRUE:conditionintervals:start_meansTRUE 0.14
## meansTRUE:conditionQDPs:start_meansTRUE 0.00
## sd_diff15:conditionHOPs:start_meansTRUE 0.13
## sd_diff15:conditionintervals:start_meansTRUE -0.33
## sd_diff15:conditionQDPs:start_meansTRUE -0.24
## evidence:conditionHOPs:trial -0.44
## evidence:conditionintervals:trial 0.68
## evidence:conditionQDPs:trial 0.40
## evidence:meansTRUE:sd_diff15:conditionHOPs -0.47
## evidence:meansTRUE:sd_diff15:conditionintervals 0.25
## evidence:meansTRUE:sd_diff15:conditionQDPs -0.06
## evidence:meansTRUE:sd_diff15:start_meansTRUE 0.31
## evidence:meansTRUE:conditionHOPs:start_meansTRUE 0.55
## evidence:meansTRUE:conditionintervals:start_meansTRUE -0.02
## evidence:meansTRUE:conditionQDPs:start_meansTRUE 0.15
## evidence:sd_diff15:conditionHOPs:start_meansTRUE 0.10
## evidence:sd_diff15:conditionintervals:start_meansTRUE -0.40
## evidence:sd_diff15:conditionQDPs:start_meansTRUE -0.02
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.35
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE -0.26
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE -0.17
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.25
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 0.29
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE -0.06
## Est.Error
## Intercept 0.17
## evidence 0.15
## meansTRUE 0.18
## sd_diff15 0.16
## conditionHOPs 0.24
## conditionintervals 0.23
## conditionQDPs 0.23
## start_meansTRUE 0.22
## trial 0.23
## evidence:meansTRUE 0.17
## evidence:sd_diff15 0.16
## meansTRUE:sd_diff15 0.19
## evidence:conditionHOPs 0.19
## evidence:conditionintervals 0.19
## evidence:conditionQDPs 0.20
## meansTRUE:conditionHOPs 0.25
## meansTRUE:conditionintervals 0.24
## meansTRUE:conditionQDPs 0.26
## sd_diff15:conditionHOPs 0.21
## sd_diff15:conditionintervals 0.21
## sd_diff15:conditionQDPs 0.21
## evidence:start_meansTRUE 0.19
## meansTRUE:start_meansTRUE 0.23
## sd_diff15:start_meansTRUE 0.20
## conditionHOPs:start_meansTRUE 0.31
## conditionintervals:start_meansTRUE 0.30
## conditionQDPs:start_meansTRUE 0.31
## evidence:trial 0.22
## conditionHOPs:trial 0.31
## conditionintervals:trial 0.32
## conditionQDPs:trial 0.32
## evidence:meansTRUE:sd_diff15 0.21
## evidence:meansTRUE:conditionHOPs 0.22
## evidence:meansTRUE:conditionintervals 0.23
## evidence:meansTRUE:conditionQDPs 0.23
## evidence:sd_diff15:conditionHOPs 0.20
## evidence:sd_diff15:conditionintervals 0.21
## evidence:sd_diff15:conditionQDPs 0.21
## meansTRUE:sd_diff15:conditionHOPs 0.25
## meansTRUE:sd_diff15:conditionintervals 0.26
## meansTRUE:sd_diff15:conditionQDPs 0.25
## evidence:meansTRUE:start_meansTRUE 0.21
## evidence:sd_diff15:start_meansTRUE 0.20
## meansTRUE:sd_diff15:start_meansTRUE 0.23
## evidence:conditionHOPs:start_meansTRUE 0.27
## evidence:conditionintervals:start_meansTRUE 0.27
## evidence:conditionQDPs:start_meansTRUE 0.28
## meansTRUE:conditionHOPs:start_meansTRUE 0.32
## meansTRUE:conditionintervals:start_meansTRUE 0.31
## meansTRUE:conditionQDPs:start_meansTRUE 0.33
## sd_diff15:conditionHOPs:start_meansTRUE 0.28
## sd_diff15:conditionintervals:start_meansTRUE 0.27
## sd_diff15:conditionQDPs:start_meansTRUE 0.28
## evidence:conditionHOPs:trial 0.29
## evidence:conditionintervals:trial 0.30
## evidence:conditionQDPs:trial 0.31
## evidence:meansTRUE:sd_diff15:conditionHOPs 0.26
## evidence:meansTRUE:sd_diff15:conditionintervals 0.26
## evidence:meansTRUE:sd_diff15:conditionQDPs 0.27
## evidence:meansTRUE:sd_diff15:start_meansTRUE 0.23
## evidence:meansTRUE:conditionHOPs:start_meansTRUE 0.29
## evidence:meansTRUE:conditionintervals:start_meansTRUE 0.31
## evidence:meansTRUE:conditionQDPs:start_meansTRUE 0.31
## evidence:sd_diff15:conditionHOPs:start_meansTRUE 0.27
## evidence:sd_diff15:conditionintervals:start_meansTRUE 0.27
## evidence:sd_diff15:conditionQDPs:start_meansTRUE 0.28
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.32
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 0.32
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 0.32
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.32
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 0.33
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 0.33
## l-95% CI
## Intercept -0.01
## evidence 1.85
## meansTRUE -0.74
## sd_diff15 0.76
## conditionHOPs -0.71
## conditionintervals -0.79
## conditionQDPs -0.14
## start_meansTRUE -0.92
## trial 0.80
## evidence:meansTRUE -0.46
## evidence:sd_diff15 0.28
## meansTRUE:sd_diff15 0.27
## evidence:conditionHOPs -0.56
## evidence:conditionintervals -0.57
## evidence:conditionQDPs -0.08
## meansTRUE:conditionHOPs -0.52
## meansTRUE:conditionintervals -0.49
## meansTRUE:conditionQDPs -0.82
## sd_diff15:conditionHOPs 0.06
## sd_diff15:conditionintervals -0.03
## sd_diff15:conditionQDPs -0.32
## evidence:start_meansTRUE -0.87
## meansTRUE:start_meansTRUE 0.02
## sd_diff15:start_meansTRUE 0.13
## conditionHOPs:start_meansTRUE -0.98
## conditionintervals:start_meansTRUE -0.90
## conditionQDPs:start_meansTRUE -0.38
## evidence:trial 1.29
## conditionHOPs:trial -0.61
## conditionintervals:trial 0.00
## conditionQDPs:trial -0.29
## evidence:meansTRUE:sd_diff15 -0.38
## evidence:meansTRUE:conditionHOPs -0.70
## evidence:meansTRUE:conditionintervals -0.28
## evidence:meansTRUE:conditionQDPs -0.41
## evidence:sd_diff15:conditionHOPs -0.27
## evidence:sd_diff15:conditionintervals -0.04
## evidence:sd_diff15:conditionQDPs -0.16
## meansTRUE:sd_diff15:conditionHOPs -1.02
## meansTRUE:sd_diff15:conditionintervals 0.02
## meansTRUE:sd_diff15:conditionQDPs -0.24
## evidence:meansTRUE:start_meansTRUE -0.03
## evidence:sd_diff15:start_meansTRUE -0.17
## meansTRUE:sd_diff15:start_meansTRUE -0.51
## evidence:conditionHOPs:start_meansTRUE -0.87
## evidence:conditionintervals:start_meansTRUE -0.41
## evidence:conditionQDPs:start_meansTRUE -0.66
## meansTRUE:conditionHOPs:start_meansTRUE -0.48
## meansTRUE:conditionintervals:start_meansTRUE -0.47
## meansTRUE:conditionQDPs:start_meansTRUE -0.63
## sd_diff15:conditionHOPs:start_meansTRUE -0.41
## sd_diff15:conditionintervals:start_meansTRUE -0.85
## sd_diff15:conditionQDPs:start_meansTRUE -0.80
## evidence:conditionHOPs:trial -1.00
## evidence:conditionintervals:trial 0.11
## evidence:conditionQDPs:trial -0.20
## evidence:meansTRUE:sd_diff15:conditionHOPs -0.97
## evidence:meansTRUE:sd_diff15:conditionintervals -0.26
## evidence:meansTRUE:sd_diff15:conditionQDPs -0.56
## evidence:meansTRUE:sd_diff15:start_meansTRUE -0.15
## evidence:meansTRUE:conditionHOPs:start_meansTRUE -0.01
## evidence:meansTRUE:conditionintervals:start_meansTRUE -0.61
## evidence:meansTRUE:conditionQDPs:start_meansTRUE -0.47
## evidence:sd_diff15:conditionHOPs:start_meansTRUE -0.42
## evidence:sd_diff15:conditionintervals:start_meansTRUE -0.93
## evidence:sd_diff15:conditionQDPs:start_meansTRUE -0.54
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE -0.27
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE -0.88
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE -0.82
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE -0.37
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE -0.35
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE -0.69
## u-95% CI Rhat
## Intercept 0.65 1.00
## evidence 2.43 1.00
## meansTRUE -0.03 1.00
## sd_diff15 1.36 1.00
## conditionHOPs 0.20 1.00
## conditionintervals 0.12 1.00
## conditionQDPs 0.77 1.00
## start_meansTRUE -0.05 1.00
## trial 1.69 1.00
## evidence:meansTRUE 0.19 1.00
## evidence:sd_diff15 0.90 1.00
## meansTRUE:sd_diff15 1.01 1.00
## evidence:conditionHOPs 0.20 1.00
## evidence:conditionintervals 0.18 1.00
## evidence:conditionQDPs 0.71 1.00
## meansTRUE:conditionHOPs 0.48 1.00
## meansTRUE:conditionintervals 0.46 1.00
## meansTRUE:conditionQDPs 0.18 1.00
## sd_diff15:conditionHOPs 0.90 1.00
## sd_diff15:conditionintervals 0.80 1.00
## sd_diff15:conditionQDPs 0.51 1.00
## evidence:start_meansTRUE -0.10 1.00
## meansTRUE:start_meansTRUE 0.93 1.00
## sd_diff15:start_meansTRUE 0.90 1.00
## conditionHOPs:start_meansTRUE 0.20 1.00
## conditionintervals:start_meansTRUE 0.26 1.00
## conditionQDPs:start_meansTRUE 0.82 1.00
## evidence:trial 2.14 1.00
## conditionHOPs:trial 0.59 1.00
## conditionintervals:trial 1.26 1.00
## conditionQDPs:trial 0.95 1.00
## evidence:meansTRUE:sd_diff15 0.44 1.00
## evidence:meansTRUE:conditionHOPs 0.18 1.00
## evidence:meansTRUE:conditionintervals 0.63 1.00
## evidence:meansTRUE:conditionQDPs 0.51 1.00
## evidence:sd_diff15:conditionHOPs 0.52 1.00
## evidence:sd_diff15:conditionintervals 0.76 1.00
## evidence:sd_diff15:conditionQDPs 0.63 1.00
## meansTRUE:sd_diff15:conditionHOPs -0.04 1.00
## meansTRUE:sd_diff15:conditionintervals 1.04 1.00
## meansTRUE:sd_diff15:conditionQDPs 0.77 1.00
## evidence:meansTRUE:start_meansTRUE 0.81 1.00
## evidence:sd_diff15:start_meansTRUE 0.60 1.00
## meansTRUE:sd_diff15:start_meansTRUE 0.39 1.00
## evidence:conditionHOPs:start_meansTRUE 0.15 1.00
## evidence:conditionintervals:start_meansTRUE 0.63 1.00
## evidence:conditionQDPs:start_meansTRUE 0.42 1.00
## meansTRUE:conditionHOPs:start_meansTRUE 0.81 1.00
## meansTRUE:conditionintervals:start_meansTRUE 0.73 1.00
## meansTRUE:conditionQDPs:start_meansTRUE 0.65 1.00
## sd_diff15:conditionHOPs:start_meansTRUE 0.70 1.00
## sd_diff15:conditionintervals:start_meansTRUE 0.21 1.00
## sd_diff15:conditionQDPs:start_meansTRUE 0.30 1.00
## evidence:conditionHOPs:trial 0.11 1.00
## evidence:conditionintervals:trial 1.24 1.00
## evidence:conditionQDPs:trial 0.99 1.00
## evidence:meansTRUE:sd_diff15:conditionHOPs 0.03 1.00
## evidence:meansTRUE:sd_diff15:conditionintervals 0.78 1.00
## evidence:meansTRUE:sd_diff15:conditionQDPs 0.48 1.00
## evidence:meansTRUE:sd_diff15:start_meansTRUE 0.77 1.00
## evidence:meansTRUE:conditionHOPs:start_meansTRUE 1.14 1.00
## evidence:meansTRUE:conditionintervals:start_meansTRUE 0.57 1.00
## evidence:meansTRUE:conditionQDPs:start_meansTRUE 0.74 1.00
## evidence:sd_diff15:conditionHOPs:start_meansTRUE 0.64 1.00
## evidence:sd_diff15:conditionintervals:start_meansTRUE 0.13 1.00
## evidence:sd_diff15:conditionQDPs:start_meansTRUE 0.53 1.00
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.95 1.00
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 0.37 1.00
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 0.45 1.00
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.90 1.00
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 0.93 1.00
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 0.57 1.00
## Bulk_ESS
## Intercept 1429
## evidence 1545
## meansTRUE 2016
## sd_diff15 1923
## conditionHOPs 1342
## conditionintervals 1586
## conditionQDPs 1573
## start_meansTRUE 1554
## trial 2108
## evidence:meansTRUE 2159
## evidence:sd_diff15 1469
## meansTRUE:sd_diff15 2018
## evidence:conditionHOPs 1723
## evidence:conditionintervals 1779
## evidence:conditionQDPs 1988
## meansTRUE:conditionHOPs 2214
## meansTRUE:conditionintervals 2339
## meansTRUE:conditionQDPs 2285
## sd_diff15:conditionHOPs 2062
## sd_diff15:conditionintervals 2044
## sd_diff15:conditionQDPs 2335
## evidence:start_meansTRUE 1475
## meansTRUE:start_meansTRUE 2105
## sd_diff15:start_meansTRUE 2231
## conditionHOPs:start_meansTRUE 1633
## conditionintervals:start_meansTRUE 1599
## conditionQDPs:start_meansTRUE 2063
## evidence:trial 2238
## conditionHOPs:trial 2193
## conditionintervals:trial 2503
## conditionQDPs:trial 1974
## evidence:meansTRUE:sd_diff15 2127
## evidence:meansTRUE:conditionHOPs 2250
## evidence:meansTRUE:conditionintervals 2190
## evidence:meansTRUE:conditionQDPs 2319
## evidence:sd_diff15:conditionHOPs 2002
## evidence:sd_diff15:conditionintervals 2281
## evidence:sd_diff15:conditionQDPs 2249
## meansTRUE:sd_diff15:conditionHOPs 2309
## meansTRUE:sd_diff15:conditionintervals 2063
## meansTRUE:sd_diff15:conditionQDPs 2446
## evidence:meansTRUE:start_meansTRUE 2036
## evidence:sd_diff15:start_meansTRUE 1712
## meansTRUE:sd_diff15:start_meansTRUE 2169
## evidence:conditionHOPs:start_meansTRUE 1732
## evidence:conditionintervals:start_meansTRUE 2067
## evidence:conditionQDPs:start_meansTRUE 2015
## meansTRUE:conditionHOPs:start_meansTRUE 2319
## meansTRUE:conditionintervals:start_meansTRUE 2249
## meansTRUE:conditionQDPs:start_meansTRUE 2582
## sd_diff15:conditionHOPs:start_meansTRUE 2280
## sd_diff15:conditionintervals:start_meansTRUE 2356
## sd_diff15:conditionQDPs:start_meansTRUE 2219
## evidence:conditionHOPs:trial 2285
## evidence:conditionintervals:trial 2570
## evidence:conditionQDPs:trial 2566
## evidence:meansTRUE:sd_diff15:conditionHOPs 2340
## evidence:meansTRUE:sd_diff15:conditionintervals 2426
## evidence:meansTRUE:sd_diff15:conditionQDPs 2329
## evidence:meansTRUE:sd_diff15:start_meansTRUE 2358
## evidence:meansTRUE:conditionHOPs:start_meansTRUE 2310
## evidence:meansTRUE:conditionintervals:start_meansTRUE 2305
## evidence:meansTRUE:conditionQDPs:start_meansTRUE 2419
## evidence:sd_diff15:conditionHOPs:start_meansTRUE 2203
## evidence:sd_diff15:conditionintervals:start_meansTRUE 2395
## evidence:sd_diff15:conditionQDPs:start_meansTRUE 2281
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 2356
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 2305
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 2554
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 2346
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 2530
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 2597
## Tail_ESS
## Intercept 1758
## evidence 2074
## meansTRUE 2416
## sd_diff15 2265
## conditionHOPs 2047
## conditionintervals 2207
## conditionQDPs 2173
## start_meansTRUE 2136
## trial 2433
## evidence:meansTRUE 2630
## evidence:sd_diff15 2394
## meansTRUE:sd_diff15 2498
## evidence:conditionHOPs 2246
## evidence:conditionintervals 2619
## evidence:conditionQDPs 2468
## meansTRUE:conditionHOPs 2687
## meansTRUE:conditionintervals 2831
## meansTRUE:conditionQDPs 2227
## sd_diff15:conditionHOPs 2642
## sd_diff15:conditionintervals 2430
## sd_diff15:conditionQDPs 2666
## evidence:start_meansTRUE 2022
## meansTRUE:start_meansTRUE 2441
## sd_diff15:start_meansTRUE 2593
## conditionHOPs:start_meansTRUE 2154
## conditionintervals:start_meansTRUE 2384
## conditionQDPs:start_meansTRUE 2285
## evidence:trial 2523
## conditionHOPs:trial 2642
## conditionintervals:trial 2719
## conditionQDPs:trial 2380
## evidence:meansTRUE:sd_diff15 2720
## evidence:meansTRUE:conditionHOPs 2590
## evidence:meansTRUE:conditionintervals 2416
## evidence:meansTRUE:conditionQDPs 2344
## evidence:sd_diff15:conditionHOPs 2636
## evidence:sd_diff15:conditionintervals 2651
## evidence:sd_diff15:conditionQDPs 2587
## meansTRUE:sd_diff15:conditionHOPs 2694
## meansTRUE:sd_diff15:conditionintervals 2815
## meansTRUE:sd_diff15:conditionQDPs 2630
## evidence:meansTRUE:start_meansTRUE 2772
## evidence:sd_diff15:start_meansTRUE 2370
## meansTRUE:sd_diff15:start_meansTRUE 2589
## evidence:conditionHOPs:start_meansTRUE 2418
## evidence:conditionintervals:start_meansTRUE 2451
## evidence:conditionQDPs:start_meansTRUE 2441
## meansTRUE:conditionHOPs:start_meansTRUE 2622
## meansTRUE:conditionintervals:start_meansTRUE 2751
## meansTRUE:conditionQDPs:start_meansTRUE 2659
## sd_diff15:conditionHOPs:start_meansTRUE 2507
## sd_diff15:conditionintervals:start_meansTRUE 2629
## sd_diff15:conditionQDPs:start_meansTRUE 2444
## evidence:conditionHOPs:trial 2686
## evidence:conditionintervals:trial 2604
## evidence:conditionQDPs:trial 2720
## evidence:meansTRUE:sd_diff15:conditionHOPs 2351
## evidence:meansTRUE:sd_diff15:conditionintervals 2534
## evidence:meansTRUE:sd_diff15:conditionQDPs 2632
## evidence:meansTRUE:sd_diff15:start_meansTRUE 2467
## evidence:meansTRUE:conditionHOPs:start_meansTRUE 2664
## evidence:meansTRUE:conditionintervals:start_meansTRUE 2346
## evidence:meansTRUE:conditionQDPs:start_meansTRUE 2532
## evidence:sd_diff15:conditionHOPs:start_meansTRUE 2631
## evidence:sd_diff15:conditionintervals:start_meansTRUE 2362
## evidence:sd_diff15:conditionQDPs:start_meansTRUE 2598
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 2517
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 2664
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 2807
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 2244
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 2561
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 2603
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Does adding block order to our previous model improve WAIC over our best model so far?
waic(m.m.logistic.r_means.sd.trial, m.m_order.logistic.r_means.sd.trial)
## Output of model 'm.m.logistic.r_means.sd.trial':
##
## Computed from 3000 by 19924 log-likelihood matrix
##
## Estimate SE
## elpd_waic -7398.5 77.2
## p_waic 1843.9 30.2
## waic 14797.0 154.4
##
## 961 (4.8%) p_waic estimates greater than 0.4. We recommend trying loo instead.
##
## Output of model 'm.m_order.logistic.r_means.sd.trial':
##
## Computed from 3000 by 19924 log-likelihood matrix
##
## Estimate SE
## elpd_waic -7389.7 77.2
## p_waic 1841.6 30.1
## waic 14779.3 154.5
##
## 954 (4.8%) p_waic estimates greater than 0.4. We recommend trying loo instead.
##
## Model comparisons:
## elpd_diff se_diff
## m.m_order.logistic.r_means.sd.trial 0.0 0.0
## m.m.logistic.r_means.sd.trial -8.8 6.4
Block order reduces WAIC somewhat, so we’ll run with this maximal model for the purpose of statistical inference.
Let’s take a look at the estimated psychometric functions for each worker.
model_check_df %>%
group_by(evidence, worker_id, means, sd_diff, condition, trial, start_means) %>%
add_fitted_draws(m.m_order.logistic.r_means.sd.trial, value = "pf", n = 500) %>%
ggplot(aes(x = evidence, y = intervene, color = condition, fill = condition)) +
geom_vline(xintercept = 0, size = 1, alpha = .3, color = "red", linetype = "dashed") + # utility optimal decision rule
stat_lineribbon(aes(y = pf), .width = c(.95), alpha = .25) +
geom_point(alpha = .15) +
scale_fill_brewer(type = "qual", palette = 2) +
scale_color_brewer(type = "qual", palette = 2) +
coord_cartesian(xlim = quantile(model_df$evidence, c(0, 1)),
ylim = quantile(model_df$intervene, c(0, 1))) +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_wrap(. ~ worker_id)
These look like the model is doing a better job of capturing variability for each individual worker.
What do the posteriors for just-noticable differences (JND) and points of subjective equality (PSE) look like? We’ll start by getting the slopes and intercepts of the linear model and using these to derive estimates of the JND and PSE for each fitted draw.
# get slopes from linear model
slopes_df <- model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(evidence = c(0, 1)) %>%
add_fitted_draws(m.m_order.logistic.r_means.sd.trial, re_formula = NA, scale = "linear") %>%
compare_levels(.value, by = evidence) %>%
rename(slope = .value)
# get intercepts from linear model
intercepts_df <- model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(evidence = 0) %>%
add_fitted_draws(m.m_order.logistic.r_means.sd.trial, re_formula = NA, scale = "linear") %>%
rename(intercept = .value)
# join dataframes for slopes and intercepts, calculate PSE and JND
stats_df <- slopes_df %>%
full_join(intercepts_df, by = c("means", "sd_diff", "condition", "trial", "start_means", ".draw")) %>%
mutate(
pse = -intercept / slope,
jnd = qlogis(0.75) / slope
)
First, let’s look at the estimates of JNDs per visualization, marginalizing across other manipulations.
stats_df %>%
group_by(condition, .draw) %>% # maginalize out other manipulations
summarise(jnd = weighted.mean(jnd)) %>%
ggplot(aes(x = jnd, group = condition, color = condition, fill = condition)) +
geom_density(alpha = 0.35) +
scale_fill_brewer(type = "qual", palette = 2) +
scale_color_brewer(type = "qual", palette = 2) +
scale_x_continuous(expression(jnd), expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior JND per visualization") +
theme(panel.grid = element_blank())
It looks like users are most sensitive to evidence (i.e., JNDs are smaller) in the quantile dotplots condition and are least sensitive with HOPs.
What if we look at the interaction effect on JNDs, marginalizing out block and trial order?
stats_df %>%
group_by(means, sd_diff, condition, .draw) %>% # maginalize out other manipulations
summarise(jnd = weighted.mean(jnd)) %>%
unite(vis_cond, condition, sd_diff) %>%
# unite(means_present, means, start_means) %>%
ggplot(aes(x = jnd, y = vis_cond)) +
stat_halfeyeh() +
labs(subtitle = "Posterior JND per condition") +
theme_bw() +
facet_grid(means ~ .)
Users are consistently more sensitive to evidence when uncertainty is high. Means seem to improve sensitivity a bit most notably with intervals.
Next, we’ll look at the point of subjective equality for each visualization condition, marginalizing out other manipulations.
stats_df %>%
group_by(condition, .draw) %>% # maginalize out other manipulations
summarise(pse = weighted.mean(pse)) %>%
ggplot(aes(x = pse, group = condition, color = condition, fill = condition)) +
geom_density(alpha = 0.35) +
scale_fill_brewer(type = "qual", palette = 2) +
scale_color_brewer(type = "qual", palette = 2) +
scale_x_continuous(expression(pse), expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior PSE per visualization") +
theme(panel.grid = element_blank())
It looks like the point of subjective equality is least biased with intervals, with increasing bias toward intervening (i.e., negative PSE) with HOPs, densities, and quantile dotplots, respectively.
What if we look at the interaction effect on PSE, marginalizing out block and trial order?
stats_df %>%
group_by(means, sd_diff, condition, .draw) %>% # maginalize out other manipulations
summarise(pse = weighted.mean(pse)) %>%
unite(vis_cond, condition, sd_diff) %>%
ggplot(aes(x = pse, y = vis_cond)) +
stat_halfeyeh() +
labs(subtitle = "Posterior PSE per condition") +
theme_bw() +
facet_grid(means ~ .)
There are couple interesting things going on here: